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A SIMPLE METHOD FOR CONSTRUCTING ORTHOGONAL 
POLYNOMIALS WHEN THE INDEPENDENT VARIABLE 
IS UNEQUALLY SPACED’ 


D. Rosson 


Cornell University 
Ithaca, New York, U.S. A. 


INTRODUCTION 


Orthogonal polynomials are commonly used in the analysis of 
variance for the construction of orthogonal contrasts among equally 
spaced levels of a treatment factor. The existence of tables [1; 2] giving 
the compounding coefficients for these particular contrasts often 
influences the experimenter to choose an equal spacing. A simple 
procedure for constructing orthogonal polynomials when the levels are 
unequally spaced frees the experimenter from this sometimes un- 
desirable restriction. 


CONSTRUCTION PROCEDURE 
The least squares regression equation 
may be expressed in the form 


= bofo(zi) + difi(zi) + r (1) 


where f,(z;) is a polynomial of degree in x; and where f, , f: , fr 
are normal orthogonal functions; i.e., where 
Of 
= {9 (2) 
i 


The least squares regression coefficients b; in (1) are linear functions of 
the treatment yields Y, , --- , Y, of the form 


b; = Y if(x;) (3) 


i=l 


1Paper No. 353, Department of Plant Breeding, Cornell University, Ithaca, New York and No. 


BU-43, Biometrics Unit, Plant Breeding Department. 
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for all 7, and represent the orthogonal contrasts commonly made in the 
analysis of variance when the treatment factor occurs at levels 
Z,,°*: ,2,. Thus, by is the zero degree or ‘mean effect,” b, is the 
“linear effect,” b, the “quadratic effect,” ete., and the sum of the 
squares of the treatment totals Y, , --- , Y, is 


i=l 


The construction of the functions f,(2;) is accomplished recursively 
from the relation 


where c, is the normalizing constant 
n r-1 n 2)1/2 
c= {> E Lite) ride) | 


This relation is easily proved by noting that if Y; = 75,7 =1,---,n, 
the least squares rth degree polynomial will give a perfect fit, so for 
eachr,O r<n-—1, 


for j = 1, --- , n., Since conditions (2) and (3) are equivalent, the 
recursive solution (5) of this system of equations is precisely the unique, 
normal orthogonal polynomials. This is the used 
by Fisher [3] for the case of equally spaced 2’s. 

Thus, for r = 0, 


Cofo(x;) = 1 or fo(x,) = 1/Vn 
For r = 1, 


= 2; — Dian"? = 2, 


or 


= TSG, — 
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The function f, is then determined from f, and f, by 
Cofo(a;) = — fo(z;) — f,(z;) 


and so on. 

An alternative construction procedure, also recursive but requiring 
the solution of r linear equations for the construction of f,(z;), is de- 
scribed by A. Grandage [4] in answer to Biometrics Query 130. The 
solution of these r linear equations is obtained implicitly in the method 
outlined above. 


EXAMPLE 


Consider an experiment with a 3 X 4 factorial arrangement of 
treatments, and suppose factor A is present at the three levels 0, 2, and 
5 and factor B at the four levels 0, 1, 3, and 6. The three orthogonal 
polynomials representing the mean, linear, and quadratic effects of 
factor A are computed in Table 1. 


TABLE 1 
CoNSTRUCTION PROCEDURE ILLUSTRATED FOR THE SPACING 0, 2, 5 


~ 
& 


Cofo(x;) Cof2(x;) 


0| 1 |0-7/8=-7/3| 0—29/3+7(196)/114=270/114 

5125| 1 |5—-7/3=8/3 | 25—29/3—8(196)/114=180/114 
[(90)*88) _ 90 


~ 414 


7/V3 = 
196/114 Et 


After cancellation of common factors within each c¢,f;(x;) column, 
these coefficients and divisors become simplified to those shown in 
Table 2. 


: 
| 
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TABLE 2 
WorkKING COEFFICIENTS AND DIvisoRs FOR THE Spacine@ 0, 2, 5 
1 0 1 -—7 3 
2 2 1 
3 5 1 8 2 
3 114 38 


Similarly, the coefficients and divisors for the mean, linear, quadratic, 
and cubic effects of the B factor become, after simplification, the integers 


shown in Table 3. 
TABLE 3 
WorKING COEFFICIENTS AND DIVISORS FOR THE SPaAcine 0, 1, 3, 6 

j cofo(;) aifi(z;) | 
1 0 1 —5 9 —5 
2 1 1 —3 -3 9 
3 3 1 1 —13 -5 
4 6 1 7 7 i 

ci 4 84 308 132 

TABLE 4 


CoEFFICIENTS OF ORTHOGONAL POLYNOMIALS AND INTERACTIONS FOR THE 
3 X 4 Factorial ARRANGEMENT 


M In Qa Le Qe Cp Qalp 
1 —7 3-5 9 —5 35 —15 —63 27 35 —15 
1 -5 9 —5 5 25 -9 —45 5 25 
Asbo 1 8 2 9 —5 —40 -—10 72 18 —40 —10 
aod; 1 —7 3 -3 -3 9 21 -9 21 -9 —63 27 
azb; 1 -1 -5 -3 -3 9 3 15 3 15 -9 —45 
ash; 1 8 2 -3 -3 9 -24 -6 -24 -6 72 18 
aobs 1 -—7 3 1-13 -5 -7 3 91 —39 35 —15 
1 -1 1-13 -5 -1 13 65 5 25 
asbs 1 8 2 1-13 -—5 8 2 —104 —26 —40 —10 
aobs 1 —7 3 7 7 1 —49 21 —49 21 -7 3 
1 7 7 1 —35 -35 -1 -—5 
asbs 1 8 2 ‘f 7 1 56 14 56 14 8 2 
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The (Kronecker) product of the two matrices of coefficients given in 
Tables 2 and 3 then gives the coefficients for the 12 orthogonal contrasts 
among the 12 treatment totals as shown in Table 4. In this table, 
(a,,b.,) denotes the total (over k replicates) yield of the treatment 
combination of the x,;th level of factor A and the z;th level factor B. 
The “Quadratic A X Linear B” (Q,Ls,) mean square, for example, is 
then 


[—15(aobo) +25(a2bo) — 10(asbo) + «+21 (aob.) —35(a2b_) + 14(asb6) 
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THE ESTIMATION OF ENVIRONMENTAL AND GENETIC 
TRENDS FROM RECORDS SUBJECT TO CULLING* 


C. R. HENDERSON 
Animal Husbandry Depariment, Cornell University, Ithaca, N. Y., U.S.A. 


Oscar KEMPTHORNE 
Statistical Laboratory, Towa State College, Ames, Iowa, U.S.A. 


S. R. SEARLE 
Animal Husbandry Department, Cornell University, Ithaca, N. Y., U.S.A. 


AND 


C. M. von Krosick 
Animal Husbandry Department, Iowa State College, Ames, Towa, U.S.A 


INTRODUCTION 


A very common problem which arises in animal or plant breeding 
research is that of assessing the gain which has resulted from a selection 
program carried on over a number of years. To be specific, let us 
suppose that we have a closed dairy herd which has been maintained 
over a number of years with selection being practiced. The records 
available for assessing any genetic improvement consist of production 
records of cows in the various years and can be represented by a two- 
way classification, cow by year. At first it might be thought that such 
a two-way classification could be analyzed by the method of fitting 
constants [Yates, 1934]. Applications of this technique have, however, 
led to the apparent conclusion that the environment gradually de- 
teriorated over the period of years, as indicated by the fact that the 
constants fitted for years tend to decrease year by year. 

Henderson [1949] pointed out that a least squares procedure in 
which the cow effects are regarded as fixed leads to biased estimates. 
Lush and Shrode [1950] gave a simple explanation of the biases arising 
in the estimation of age correction factors; similar considerations apply 
to the estimation of year effects. The present paper is the combined 


*Journal Paper No. J-3458 of the Iowa Agricultural and Home Economics Experiment Station. 
Ames, Iowa. Project No. 890. 
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result of work in this problem carried out over the past few years in- 
dependently at Iowa and Cornell. 

During the summer of 1957 the two of us at Iowa (Kempthorne 
and von Krosigk), after preparing a paper for this journal, had corre- 
spondence and personal discussion with the two at Cornell (Henderson 
and Searle), from which it was felt that a combined paper would best 
suit our present state of knowledge of the problem. A method for 
estimating environmental trend when repeatability is assumed known 
was outlined by Henderson in mimeographed material that has been 
circulating for some years; a method not requiring an assumed value for 
repeatability and now being proposed by two of us (IX. and K.) appeared 
at first to differ, but when one of us (S.) showed that for given repeat- 
ability these two methods are equivalent, it seemed more suitable to 
present all the work in one publication rather than two separate ones. 
This paper therefore presents both methods and the manner in which 
they are equivalent, attributing the various sections to their appropriate 
authors. 


ORIGIN OF THE BIAS IN ROUTINE LEAST SQUARES 
(KEMPTHORNE AND VON KRosIGK) 


Lush and Shrode [1950] have shown how bias enters into the esti- 
mation of age correction factors due to culling. Similar arguments 
apply when estimating year effects. These admit of easy explanation 
in the simple situation in which a cow’s first-year record x;, , and her 
second-year record x;. , conform to a bivariate normal distribution, 
with a strictly additive difference y between first and second year 
records, based on the model 


The e’s are errors arising from differing environments among the same 
arimal’s records from year to year, and from inaccuracies of measure- 
ment, and ¢; is common to all records of cow 7. The c’s and e’s are 
random variables with zero means and variances o% and o% respectively, 
and all covariances are zero. The expectation with regard to the errors 
is over a hypothetical set of repetitions in the particular year with any 
particular cow. The expectation with regard to the cow effects is over 
the population of cows which could have entered the records, it being 
hypothesized that the particular set entering the records is a random 
sample of the possibilities that could have arisen. The fact that cows 
in a herd will probably be somewhat related would vitiate the assump- 
tion that the covariances between any two c’s is zero. We shall, how- 
ever, in this paper not extend the argument to take care of this situation. 
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It should be emphasized that we are supposing no age or lactation- 
number effect, on the assumption that certain correction factors obtained 
apart from the data to be analyzed are appropriate. The validity of 
this assumption will be discussed at the end of the paper. 

With this model the herd average for the first year (mean value of x) 
is » and that for the second year (mean value of y) isu + y. x and y 
both have variances a? + o? and the covariance between them is o? , 
so that based on a bivariate normal distribution the mean value of y 
given z is 

— 

where r = o2/(o2 + o?), a ratio known in animal breeding work as 
repeatability [Lush, 1949]. If we take the expectation of this con- 
ditional mean over all the cows who had a first record we obtain of 
course » + 7, the population mean for y. But the culling has the effect 
that the expectation of x over the cows retained in the herd after the 
first record is equal to a number say yw’, which would not in general be 
equal to y, unless the culling were either based only on some attributes 
statistically independent of x, or were a peculiar type of balanced culling 
which did not affect the mean. The usual thing is probably some sort 
of truncation selection, though probably based on an index of which x 
is one component rather than on x alone. It therefore follows that the 
mean of the second-year records of cows which are retained is 


— 


In this simple case the method of least squares gives an estimate of 
the year difference y, as the average of second-year records minus the 
average of first-year records of those cows that had a second-year 
record, and this estimate has expected value, 


This is the paired comparison method of estimation (method B of Lush 
and Shrode); with the usual type of selection »’ will be greater than yn, 
so this estimate of the year difference is biased downward to the extent 


(1 — r)(p’ — 
With the gross comparison method (method A of Lush and Shrode) the 
year difference is estimated by the difference between the mean of all 
second-year records and the mean of all first-year records; such an 
estimate has expectation 
— 4) — 4, 
and hence is biased upward by an amount r(u’ — y). 


| 
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METHOD I (HENDERSON) 


Suppose the linear model for a cow’s record is 


Yur A+ Je + Cit Cine 


where y;,; is the record in the kth year made by the 7th cow of the ‘th 
group of cows in a herd. These groups might for example represent 
daughters of a bull, sets of cows born within a specified period, or groups 
of cows that enter the herd together. yu is the population average, d, 
is the environmental effect of the kth year, g, is the mean real producing 
ability of the tth group of cows, c;, is the real producing ability of the 
ith cow of the ith group, and e,,, is a random environmental effect 
peculiar to the individual record. We will assume that the c,, are 
normally and independently distributed with zero means and variance 
o? , that the e;,, are normally and independently distributed with zero 
means and variance o? , and that the c’s and e’s are uncorrelated. These 
assumptions imply that the cows of a particular group are randomly 
drawn from a normal population with mean » + g, and variance o? . 
Furthermore, temporary environment is not correlated with real pro- 
ducing ability. The problem is to estimate differences among the d’s 
and g’s assuming that repeatability is known. 

The following method for maximum likelihood estimation of fixed 
elements of mixed linear models has been derived by one of us (H.). 

Let the mixed linear model be 


y= X8B+Zu+e 
where 8 is a vector of fixed effects, while u and e are independent vectors 
of variables that are normally distributed with zero means and variance- 
covariance matrices Do’ and Ro’ respectively. Then y has a multivariate 
normal distribution with means X8 and variance-covariance matrix 
(R + ZDZ')o’, and the m.1. estimator of 8, say 8, is the solution to 


X'(R + ZDZ') = X'(R + ZDZ')"'y (1) 
assuming that the coefficient matrix is non-singular and that 8 is esti- 
mable. The difficulty in applying this method is that R + ZDZ’ is, 
in practice, often large and non-diagonal. 


Now the same estimator can be obtained by maximizing for variation 
in 6 and u the joint density function of y and wu. This function is 


fly, uw) = 


Const. exp E (y — XB — Zu)'R"'(y — XB — Zu) 


= 
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Differentiating this with respect to 8 and u and equating to zero gives 
the following equations: 


X'R’'XB + X'R'Zi = X'R"y 
Z'R'XB + (Z’R"Z + D")i = 


These equations except for D~' are identical to those for the m.]. esti- 
mation of 8 and u regarding u as fixed. In many problems the equations 
are easy to write since 2 and D are diagonal. 

To prove that § and 8 are identical we eliminate & obtaining 


X'WXB = X’Wy 
where 
Now, if it can be shown that W = (R + ZDZ’)~* these equations are 
identical to those given in (1) and then 8 = 8. We show this by proving 
that (R + ZDZ')W = I. 


(R+ ZDZ)W = (R+ ZDZ’)[R" — R'Z(Z'R'Z + 


I+ ZDZ'R" — XZ'R'Z + D")"Z'R" 
— + D") 
I+ ZDZ'R" — ZI + DZ'R'Z)(ZR'Z + D")"Z'R" 


I+ ZDZ'R" — ZD(D" + + 
I+ ZDZ'R" — ZDZ'R" 


Il 


= I, thus completing the proof. 


In applying this method to the present problem y, d’s, and g’s 
correspond to 8, and c’s correspond to u. The joint distribution of the 
and is 


1 | 


x {Thy Ya exp > 


= 
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The maximizing values for variations in yu, g’s, d’s, and c’s are the 
solutions to the following set of equations: 


Dina d+ t+ =y.., @ 

and analogous equations for other years (k); 


and analogous equations for other groups (4); 


Nich + min d, + + (n.. + "Ye, (5) 
and analogous equations for other cows (7, t). 
Nixe = 1 if the ztth cow has a record in the kth year, and is zero others 
wise. A dot in the subscript denotes summation over that particular 
subscript. 

It will be noted that the term (1 — r)/r appears in the coefficient of 
the é’s in equations (5). This arises from the ratio o7/o? and the defi- 
nition of repeatability, namely o7/(o2 + 0%). As used here repeatability 
is not intra-herd as Lush [1949] uses it, but rather intra-group, (within 
g: groups) and therefore depends on the choice of groups. 

Solving equations (5) gives 


4 


d is the mean of d, associated with that particular cow. Using these 
expressions we can eliminate ¢;, from equations (2), (3), and (4). These 
reduced equations are of the form 


d, + Me (B+ Gi) = (6) 
and analogous equations for other years (k’); 
me d+ = 24 (7) 


and analogous equations for other groups (t). 
The m’s of these equations are functions of the n’s, and the z’s are 


*Animal breeders will be interested in the fact that €;; is Lush’s [1949] ‘“‘most probable producing 
ability’’ modified by correcting the records for years and then expressing the result as a deviation from 
the group mean. 


ak 
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functions of n’s and y’s of equations (2), (3), (4), (5). As a check on 
the computations it should be noted that 


mx = m-, for each year (k’) 
k t 

> m, = m., for each group (t) 
k 

= 


It is shown later that these equations are equivalent to the m.l. equa- 
tions of Method IT when r is assumed known. 
Now from (7) 


Substituting this in (6) we get equations in d, as follows 
Wee dy = (8) 
and analogous equations for other years (k’). The fact that 
> Wx. = 0 for each year (k’) 


and 


= 0 


can be used as a computational check. Because of this linear relation- 
ship among the coefficients (8) does not have a unique solution. Con- 
sequently, we impose one constraint on the estimators, a convenient 
one being d, = 0, where f refers to the final year. To solve (8) with this 
constraint we delete the last equation and the last unknown of the 
remaining equations. Then d, is the m.l. estimator of d — d, , and 
pf + @, is the m.]. estimator of 1» + g, + d,. Therefore d, — d,- is the 
m.l. estimator of d, — d,- and (@ + g,) — (@ + @,-) is the m.]. estimator 
of g: — gr. 

Use of an incorrect value of repeatability biases the estimates of 
environmental and genetic trends. For example, if too large a value 
of r is used and if cows that were culled had lower records than their 
contemporaries, the estimate of environmental trend is biased down- 
ward. 

If r is known, the sampling variance-covariance matrix of ~ + @, 
and d, , when d, = 0, is o? times the inverse of the coefficients of (6) 
and (7) with the {th row and column deleted. The upper left submatrix 
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of this inverse can be obtained by inverting the coefficients of (8) with 
the last row and last column deleted. 

The following methods for testing hypotheses concerning the d’s 
and g’s are appropriate if r is assumed known. 

The denominator sum of squares in the F tests is 


2 2 
- 
— ¥2./m., {from (7)] — dy, [from (8)]. 


This has degrees of freedom = number of records — number of years — 
number of groups + 1. The numerator sum of squares for testing the 
hypothesis that all d, are equal is 


> d,v, from (8). 


The numerator sum of squares for testing the hypothesis that all g, 
are equal is 


> 2./m., [from (7)] + dv, [from (8)] — 


where d, are solutions to equations (6) after first deleting (@ + @.). 


A NUMERICAL EXAMPLE OF METHOD I (HENDERSON) 


Suppose we have the following records, classified according to 
group, cow in group, and year of freshening. 


Year Fresh 

Group Cow 1 2 3 Total 
1 | 450 420 410 1280 
2 500 470 500 1470 

3 410 430 840 

4 390 390 

2 1 400 430 830 
2 430 380 810 

3 380 380 

3 1 500 500 
460 460 


t k 
"Fe 
k 
k k 
Total | 1750 2530 | 2680 6960 = i 
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We shall take a value of 1/2 for r; then as in (5), (1 — r)/r = lis 
added to n;., to form the diagonal coefficients of these equations, (5). 
Then the complete set of equations to be solved, namely those in (2) 
through (5) is 


8381321 1 the 6960 | 
4400400111100 00 1750 
60603 3011410414141 2530 
9432900332100 0 0 04%, 3980 
503205000002 21 0 2020 
2002002000000 01 149 960 
3111300400000 0 0 0} | 1280 
3111300040000 0 0 0] | 1470 
2110200003000 0 0 04 4; 840 
1100100000200 0 0 04] 390 
014 830 
2011020000008 00 0) 4, 810 
101001000000 0 20 380 
100100100000 0 0 2 O} &, 500 
110010010000000 0 L 460]. 


The ¢’s are eliminated to obtain (6) and (7) as illustrated by the 
coefficient of @ + g, in the second equation of (6), which is 


1(3) _ 12) _ 5 


The right hand member of the first equation reduces to 


_ 11280) _ 11470) (0460) _ 3525 


4 4 2 6 


Multiplying all equations by 6 to eliminate fractions gives the following 
equations (6) and (7): 


4 3 6 


ESTIMATION OF TRENDS 201 


-3 8 O Off d, [3525] 
-5 2% -7 7 Ol] dd 4955 
—3 -7 2 3 4 6]| | =| 5795 
5 3 16 0 Ollat+¢@, 6975 
7 4 O11 4420 
| 0 0 6 0 O L2880. 


Note that the sum of the first three equations equals the sum of the last 
three. 

Now we eliminate @ + @, to obtain (8). For example, the coefficient 
of d, in the first equation is 


_5 8(5) _ 0(7) _ 0@) _ =15 
16 11 


and the first right hand member is 


8(6975) _ 0(4420) _ 0(2880) _ 75. 


3525 — 16 11 6 2 


Then multiplying each equation by 528 we get 


6336 —3960 —2376 || d, 19,800 
—3960 9495 —5535 || d.| = | —19,755 
—2376 —5535 7911 —45 |. 


These are equations (8). Note that their sum is zero. 
Imposing the restriction d; = 0, the solution is 


d, = 2.468, d, = —1.051 


and substituting in the second set of equations and solving for @ + @ 
we get 


at g, = 435.032 
A + = 402.487 
a+ g, = 480.000. 


METHOD II (KEMPTHORNE AND VON KROSIGK) 
The discussion of the origin of the bias in the method of fitting 


= 
| 
€ 
; 
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constants leads to an obvious estimation procedure for the case of two 
years. We use the following notation: 


ll 


average of all first-year records, 

average of first-year records of cows which are retained in the 
herd, 

y = mean of second-year records (of cows retained, of course). 


ll 


The regression line of the joint distribution of the y’s and 2’s is 
estimated unbiasedly even if selection of any sort is made with regard 
to the z’s, provided only that the true regression, or relationship to x 
of conditional mean of y for given 2, is linear throughout the entire 
range.* This happens automatically if the joint distribution of y and z 
in the unselected data is bivariate normal. It follows that the line 


y — 7 = f(x — 
is an unbiased estimate of the line 


y—(ut+y) — 


> (x — #,)° 
summations being over cows with both first- and second-year records. 
It follows then that 


where 


is an unbiased estimate of 
+ 


since ~ is an unbiased estimate of » with errors independent of the 
errors of estimation of r. Hence an unbiased estimate of y is 


—Z) —F= — — FE, — 4). 
We shall find the variance conditionally on the z’s for which second 
records exist. We have 
4=9+4+( — Z,) (9 — - (1 — fz, 
Vig — + — — 2 Cov [9 — Fa, , (1 — AZ] 


=2 2 2 
+ 2u Cov 7 — 


*A bias would occur if culling were based also on some attribute correlated with both z and y. 
This would occur if selection were based on dam’s performance for example. The bias can be seen to 
be of the order of mi2(1 — r) where r: is the correlation of the attribute with z and y. This will be small 
in many cases, and for the particular case discussed here. 


V@) 


Il 


4 
a 
| 
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with S = >> (« — 2#,)? and m, n, equal to the total number of first 
and second records respectively. 


An unbiased estimate of this is given by 


2 2 2 
, &-® 8y 1282 
f) + + S + nS 


with s? 
and 


mean square among first records 
mean square about regression of second on first records and 
is an unbiased estimate of 0%, 


The unconditional variance would depend also on Cov (#, £,) which 
would be based on the culling procedure. 

The procedure given above leads to an unbiased estimate of the 
year effect but the estimate is not efficient in the sense of exhausting the 
information even in the data from two years with successive records, 
because we have estimated r only by the regression of second-year 
records on first-year records. Under the assumptions the mean square 
of totality of first-year records is an estimate of o”, while the mean 
square of the second-year records about the fitted line is an estimate of 
o(1 — r’). So we could obtain an estimate of r,* apart from sign, 
purely from these two mean squares. One problem of fitting is therefore 
to get the estimate of r which exhausts the data. Also data will extend 
over more than 2 years for many cows. 

For these reasons and because a solution can be obtained by the 
method of maximum likelihood, we shall now use this method to attack 
the general case. The material given above is to be regarded as a first 
approximation to an exhaustive solution and indicates the sort of thing 
that can happen when one deals with data culled on the basis of part 
of the actual data. The approximate solution given above will also be 
of use in obtaining the maximum likelihood solution, because this 
solution will be obtainable only by iterative solution of non-linear 
equations. 

As a framework for analysis of the general situation, suppose we 
have ¢ records of a sample of the population of individuals and that 
there is no systematic environmental effect and no culling. Then it 


*This is independent of the regression estimate because the sum of squares of deviations of y 
about its regression on z is independent of that regression and of the z's. 


| 
2 
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is reasonable to regard the jth record of the 7th individual as being 
given by the equation 


~ ete + 


in which the portion c; is common to all records of the ith cow. Also 
we suppose that the c,’s are normally and independently distributed 
with mean zero and variance o° , that the e,;; are normally and independ- 
ently distributed with mean zero and variance o2 , and that the e;; and 
c; are independent. Then we have, with V denoting variance, and Cov 
denoting covariance, 


Cov (yi; Yiir) = = 10°," 
with 
2 
oto, 


The variance-covariance matrix of the ¢ records of an individual is 
therefore 


1 


i.e. o” times a matrix with unity on the diagonal and r in every off- 
diagonal position.’ Let us now consider the regression of each record 
on the preceding ones. We have the following relationships, in which 
2;; equals y;; — 


o(1 


l+r 


r 
fa = + 2:2) + V(eis.12) 


+ 2r (241 + 2:2 + + V is.123) = 


| 
q 

| 


1+ 3r (Zi + + 2i3 + + €is.i1234 » 


V(@is.1234) = 


| 
to 
| 
+e 


r 
2 
o 
| A 


ESTIMATION OF TRENDS 205 


and so on. These regression equations may be obtained by the use of 
the method of least squares. As an example, to find the regression of 
Zi, ON Zi , 212 , and 2; we have to fit the equation 


a = + + Batis 
which gives the normal equations 
+ Boro” + = ro” 
Biro” + Bao” + Byro” 
B,ro” + + = ro” 


to which the solution is 


2 
ro 


r 


1+2r 


The residual variance is the total variance of z;, minus the sum of 
products of regression coefficients and right-hand sides of the normal 
equations. In addition, from the general regression theorem that the 
residual from a regression is uncorrelated with the “independent” 
variables that are included in the regression we know that 


B, = = Bs = 


€;2., is independent of 2;; , 
€.3.12 18 independent of z;, and z;. and hence of 
€.4.123 18 independent of z;, , and z;; and hence of e;3.,2 and so on. 


The joint distribution of the y;; , 7 = 1, 2, --- , ¢ with 7 fixed is 
therefore expressible as 


1 x 1 E 1 Cio. 1 | 
V 20" Vi-r — r’) 
1 


— 2 
4/1 — 26'(1 -) 
We now apply these elementary distributional results, which are prob- 
ably widely known even if not stated explicitly somewhere in the 
literature, to the problem at hand. 

We suppose that the first records of an individual are not subject to 
culling* at all and that individuals are culled after the first record on 
any basis. The existence of a second record usually depends in some 
way on the value of the first record, of the third record on the value of 


x 


*Any such culling is more properly to be regarded purely as selection of incoming animals and 
the effects of such selection will be included in genetic parameters to be estimated. 


| 
jt: 
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the first two records. We suppose that the group of individuals con- 
tributing first records in year ¢ deviates from the population mean 
systematically by an amount g, , and that records made in the kth 
year deviate from the population mean systematically by an amount 
d,. Wecan therefore write down the joint distribution of the totality 
of records as follows. Let y;;,, denote the jth record of individual 7 
which was made in year k, this individual having entered the herd in 
year t. Then the distribution of the y;;,. is 


F, X Poy X Faye X 


where 


2 
exp Gt d,) | 


1 
I] V 


i= denoting the product over all first records, and 


1 


| — — ge — — — aN") 
ed | 20°(1 — 1°) 


I ]:2) denoting the product over all second records, k’ 
denoting the year of the first record corresponding to the second 
record 


and 


= I] 


1 
2r° 
td li¢+r 


-exp | —4(Yisee — — Ge — — — — Ge — Ae) 
1+r 


2 27? 


I]:s) denoting the product over all third records, and k’, k”’ denoting 
the years in which the first and second records corresponding to the 
third record y;3,, were made, and so on. 

This joint distribution takes account of the fact that culling on 
prior records to an unknown extent was made. It incorporates, super- 
ficially at least, all the ingredients which one would like, namely 


4 
4 
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u, the mean in an arbitrary base population; 

g: , the amount by which the group of individuals entering the herd in 
year ¢t deviate from the base population; 

d, , the amount by which the records made in year k deviate from those 
made in an arbitrary base year; 

r, the repeatability; 

o’, the variance of records of a population of unculled individuals. 


The only regrettable feature of the joint distribution is the assumption 
that o” does not change, though if repeatability is not high and culling 
and selection not intense, one would not expect the genetic part of o” to 
change appreciably over a period of many years. We have no easy 
alternative to assuming the environmental component remains con- 
stant.* Without this assumption the notion of repeatability would, 
of course, break down, at least partially. 

The logarithm of the likelihood is equal, apart from a constant, to 


2 
log L = ot — = =a] 


2 
() 20 


log o” — 5 log (1 — r’) 


(2) 


20°(1 — 


5 log o* — 5 log (1 


2 2 2 


+ etc., 


where denotes summation over all first records, >> summation 
over all second records, and so on. An alternative expression for log 
L is more suitable for visualizing the likelihood and the problem of 
maximizing this likelihood. Let N denote the total number of records 
available, n, the total number of first records, n. the number of second 
records, and so on. Also let > denote summation over individuals 


*The extent to which this assumption is false could be examined by fitting a more general model, 


o 
ey 
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with only a first record, }>,2.) summation over individuals with only 
first and second records, and so on. Then 


Nigg? —™ _ - 
log L = logo 9 log (1 r’) 9 


log (1 1+ 2 20° 


20°(1 — r’) 


(2) 


o 


(3) 


Tor 9: ay} / 2 (1 


20°(1 — 


20 

With records covering s years we shall have (2s + 3) parameters to fit, 

of which one g and one d may be chosen arbitrarily. We may as well 

carry on the mathematical manipulations symmetrically in all the g’s 

and d’s, and impose necessary conditions at as late a stage as possible. 

We find 


dlogL _ — — — dh) 
09, ‘ 


(Yionn — — Gp — At) — — — Gp — Ay’) 
o(1 +r): 


+2 —B- — — — — Jo — de) 


(3) p 


— awn} + 2r) + etc., 


where . for instance means summation over all individuals of group 
p (i.e. entering the herd in year p) with first and second records only. 

The derivative with regard to a particular d effect, d, , say, is not as 
easily reduced to manageable proportions. We find 


4 
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aL 


(@) 


— — de) — — — ge — — 7°) 


+ —p-g—d)— l+r (Yioare d) 


(ea’a'’) 
l+r ila’’t ge q’’ 


(a’aa’’) 
of, _ 


r 
+ > {ie d,) 14+ 2r (Yisare 


(ae’@’’a’’’) 


— Ge — dyer) 


1+ Op Je — (1 =) 


1 + or (Yisee Je d,) 


Gy mys) 


i 
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r 


r 1 2 sr.) 
— ah ote - + ete. 


where, as an example of the summation notation, )>/,.,:-,q’+7) means 
summation over the function in the brackets for those individuals whose 
third record is in year q, and q’, q’’, q’’’ denote the years in which the 
first, second, and fourth record respectively of each such individual were 
made. These derivatives are indeed formidable but the method of 
obtaining the maximum likelihood estimates will be iterative and one 
will make rough estimates of the parameters which will be used to 
compute the first derivatives numerically. Adjustments to the rough 
estimates will be obtained by solution of linear equations. Alternatively 
if one can determine the best estimate of r, the estimates of yu, g, , dy 
are obtained as the solution of linear equations. 

The derivative of log L with respect to yu is the sum of the derivatives 
with respect to the g’s or the d’s. 

In addition we have 


— — Ge — ke) — — d,-)}?/20°(1 —r) 


2 2 
a} /20‘(1 -) + ete. 


It is easy to evaluate o* by equating this derivative to zero if one has 
values for y, the g’s, the d’s, and r. The derivative with respect to r is 
not as simple, for we have 


4 
| 
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dlogL mr 4 ngr(2 + 1) 
4 n,3r(2 + 2r) ns4r(2 + 3r) 
21 — r\(1 + + 8r) 20 — (1 + + 
(2) 20° 
@) 20° 2r)*(1 = r)’ 
20° 
3r(2 + 2r) 
(Cioke — 


& ol 


> Ciske Cink’ t Ciik’'t 
(3) o(1 —r)(1 + 2r) 


r r 
+ = 1 + 2r — 1 Op 
+ + — + + 87) + ete. 


where = — — — 

The maximum likelihood estimates for u, g, , d, , 7, 0° are the values 
for these quantities which make the derivatives equal to zero. The 
equations obtained by setting the derivatives equal to zero are non- 
linear in the parameters so the method of solution followed will be 
iterative. In a situation with estimation of parameters 6; , 02, -°-- , 0, 
a general procedure is to expand the quantities d log L/00; by a Taylor 
series around guessed values, ignoring terms of higher order than 
quadratic, giving for 6, , for instance, 


a log L | _ \ Flog L 
06, 90: Io + (4, 610) 00; 
log L log L 
+ (6, — 0) ——=— (6, — 0,0.) —=—| . 
— Oa) 80, 36, |e. 30, 30, |. 
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In this equation 4 is one arbitrary point (6, , 6. , --- , 6,) in r-dimensional 
space and 9% or (10 , 920 , --* , 0) is another arbitrary point, and all 
derivatives on the right-hand side are evaluated at , , , 9,0). 
The maximum likelihood estimates are those for which the left-hand 
sides are zero, so by putting these left-hand sides equal to zero and 
solving for 6, — 010 , 62 — O20, -** , We get an approximation to the 
maximum likelihood estimates. The closer the original 619 , 820, --- , 9,0 
are to the maximum likelihood values, the smaller will be the steps 
from @;) to 6; and when these steps are negligible we take 6; or 0,9 to 
be the solution. 

A modification of the purely mathematical procedure for finding the 
maximum of a function is to use in a statistical situation the expected 
values of the second derivatives rather than the actual derivatives. It 
was in fact in these terms that the iterative method of maximum likeli- 
hood was first presented. As a general rule the authors prefer to use 
actual derivatives but in the present case it can be seen there are con- 
siderable advantages in using the expected values of the second de- 
rivatives. 

The problem is strictly a computational one from this point on. It 
does not seem appropriate to pursue the matter further herein, particu- 
larly as we have had no experience with a set of data of the dimensions 
that ordinarily arise. We shall therefore close this section with some 
remarks on the situation, leaving the computational details to a later 
publication. 

It is clear that i: we possess a good estimate of r (see below, however) 
the problem is strictly one of weighted least squares with known relative 
weights. This suggests that one should first estimate r in as good a way 
as possible, and then use weighted least squar-s on linear functions of 
the records. One would then presumably got igh at least one cycle 
of the iteration which would result in estimat’s and estimated variances 
and covariances of the estimates. 

The extent and direction of bias resulting from estimating r without 
taking genetic trend into account is an important and unsolved problem. 
Another question lies in the propriety of applying an r value for one 
herd to another herd when there is no reason to suppose that r is constant. 


A PARTIAL EXAMINATION OF A SET OF DATA 
(KEMPTHORNE AND VON KROsIGK) 
The basic ideas of the estimation procedure of Method IT have been 


explored with the data of the Iowa Board of Control herd at Woodward 
for the period 1940 to 1954. The aspects we have examined are the 
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consistency of the regression equations of a record on preceding records 
and the estimation of r. 

Considering only first and second records made in successive years, 
we examined the homogeneity of the regressions between starting 
groups, which are defined as the groups of cows entering the records in 
each year. The analysis of variance is given in Table 1. 


TABLE 1 
Test FoR HOMOGENEITY OF WITHIN STARTING YEAR REGRESSIONS 
Source d.f. 8.8. M.S. 
Among groups 14 | 38,210,685 
Common regression 1 257,323 
Additional due to separate 13 52,514 4,040 
regressions 
Residual 205 1,067,441 5,207 
Total 233 39,587,963 


There is no evidence at all for different regressions so we take, as 
the estimate from this portion of the data, r equal to 0.501 with a 
standard error of 0.071. 

For the portion of the data with first, second, and third records in 
three successive years, we found that the within-group partial regression 
of third record on first record was 0.278 + .115 and of the second record 
on first record was 0.501 + 0.133. The difference of these two co- 
efficients leads to a ‘t’? value equal to 1.1, which in no way contradicts 
the hypothesis of equality of the two coefficients. The estimate forcing 
the two partial regression coefficients to be equal was 0.378 + 0.071. 

For the portion of the data with four records in successive years the 
multiple regression of fourth record on first, second, and third gave 
partial regression coefficients as follows: 


on first: 0.145 + 0.218 
on second: 0.268 + 0.184 
on third: 0.228 + 0.140 


The F value for differences of these was 0.07, which does not contradict 
the hypothesis of equality. The estimate forcing the three partial 
coefficients to be the same was 0.218 + 0.066. 

These three estimates are independent. They can best be represented 
in tabular form. Table 2 shows the estimates, the resulting estimates 


| 
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TABLE 2 
EsTIMATES OF REPEATABILITY 


Source Estimate ft Weight 
fF 0.501 + .071 | 0.501 + .071 198 
r/i +r) 0.378 + .071 | 0.608 + .184 29 
r/(1 + 2r) 0.218 + .066 | 0.387 + .208 23 


of r, and the weights. The best estimate of r from the regression co- 
efficients is 0.503 + .063. This estimate has been used to obtain 
estimates of yearly environmental trends from first and second records 
only. The computations are summarized in Table 3. The sum of the 
estimates of the yearly environmental trends gives an estimate of 31 + 98 
pounds of butterfat for the total change in environment. 


TABLE 3 
EstIMATED YEARLY ENVIRONMENTAL EFFECTS 
Year| z (9 — — — 2) 
1940 | 26 20 | 360.8 | 363.7 | 422.0 59.7 + 18.0 
1941 14 11 | 382.7 | 403.6 | 431.6 38.4 + 24.3 
1942 | 22 22 | 449.3 | 449.3 | 428.1 —21.2 + 17.6 
1943 | 13 9 | 373.1 | 414.6 | 414.1 20.1 + 26.7 
1944 6 3 | 423.4 | 413.3 | 355.0 —63.3 + 44.8 
1945 | 13 10 | 352.2 | 344.2 | 335.8 —12.4 + 25.4 
1946 5 2 | 363.4 | 327.0 | 363.0 17.9 + 54.2 
1947 | 37 | 31 | 354.5 | 357.7 | 345.2 —10.9 + 14.6 
1948 | 21 16 | 385.1 | 411.7 | 388.3 —10.2 + 20.0 
1949 | 31 23 +| 388.3 | 415.7 | 441.0 38.9 + 16.8 
1950 | 31 22 | 411.6 | 447.3 | 372.3 —57.3 + 17.2 
1951 | 28 18 | 361.4 | 389.8 | 375.3 —0.4 + 18.7 
1952 | 39 19 | 376.2 | 401.0 | 444.2 55.5 + 17.8 
1953 | 40 27 | 464.0 | 473.1 | 444.6 —24.0 + 15.3 


To illustrate the importance of the assumption that age effects have 
been eliminated suppose all of the cows in the above example freshened 
for the first time at two years and two months. Then. if these first 
records were multiplied by, say, 1.25 instead of 1.28 the estimate of the 
total environmental change would be increased by approximately 120 
pounds. Of course, if records at all ages were used any effect of age being 
incorrectly discounted would be damped down. However, first and 
second records will always comprise the major share of the data. 
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If a term were added to the model for ages so that the effect of ages 
could be estimated simultaneously with the other factors, one would 
have more confidence in the estimates. However, it can be seen, as 
pointed out by Rendel and Robertson [1950], that if ages and starting 
dates were both classified with the same accuracy they would be per- 
fectly confounded and there would not be a unique solution to the 
equations. It does not seem logically defensible to classify the records 
by years for start of lactation and by, say, months for ages in order to 
break down the perfect correlation between the two. 


EQUIVALENCE OF THE METHODS (SEARLE) 


Method II uses four subscripts on y, y;;,, being the record made in 
year k by the 7th cow of the group of cows whose first records were made 
in year t, its being the cow’s jth record. Thus j is no more than an 
ordinal indicator of which record of the cow y;;,; is and in terms of the 
elements of the model it plays no part. In the case where a cow’s 
second record is always made in the year immediately following that of 
her first record, and her third record always comes in the year following 
that of her second, and so on, j = k + 1—t. The presence of the j as 
a subscript to y merely emphasizes that records may not occur in con- 
current years—when they do, as in many situations, the j is redundant 
because of the above relationship. In either case the model of Method 
I is applicable: 


Yar + Ge + eine - 
In this model the random variables are the c’s and e’s. For the cow 
(z, t) having n,;., records it is convenient to define the following column 
vectors: (i) ys: , her n;., records, (ii) u;, , a vector of n;., 1’s, and (iii) 
d;, and e,, as the d’s and e’s appropriate to her n;,, records. Then the 
likelihood of the sample of y’s for all cows is 


1 
(VW | An 


exp —1/2(¢;.ui. + + e,,) 


where A,, is a square matrix of order n;., , all diagonal terms being 
o = o2 + o?, and all non-diagonal terms being ro”, where r is repeat- 
ability. This is the variance-covariance matrix of the n,., records of 
the cow (7, t). Due to its special form the likelihood can be expanded 
and be shown to be equal to 

TT + TI Gon exP — — nil + — 


L= 
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where M is the number of cows. This expression is the likelihood of the 
sample of y’s. It is therefore the likelihood used in Method IT, although 


it was there written as the product over all cows of the conditional 
likelihoods 


F, FF 3\2.1 


equal to 


| Yad Ls(Yise | » Yer) 
With the substitution 


Car = Yur —B— — Ge — Cie 


the numerator of L as given above is the joint distribution function 
which has been maximized in Method I with respect to y, the d’s, g’s, 
and the c’s. In Method II, Z has been maximized with respect to yu, 
the d’s, g’s, and r. Therefore the Method I equations with the follow- 
ing three amendments will yield the Method II equations: 


(i) Include the equations arising from maximizing L with respect 
tor. 

(ii) Subtract from the equations arising from the differentiation 
with respect to uw, the d’s, and g’s, those terms arising from 
differentiating the logarithm of the denominator of L with 
respect to these same parameters. 

(iii) Delete the equations arising from maximizing L with respect to 
the c;,’s. 


In the situation where r is assumed known and is not to be estimated 
the equations of (i) will not occur. 

As an example of the terms in (ii), consider the equation arising 
from differentiation with respect to x. In Method I this will come from 
equating to zero the expression 


Ou 20; 


and this is 


(l — no 
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In Method II there will be added to this expression the following 
Ou — + — Dr) 


which equals 
r(l — [1 + (n;.. — Dr] 


In the difference the terms in ¢;, are 


Nit +n;..) = 0, 


i.e. the é’s are eliminated from the equation; and this will be found to be 
true for the equations corresponding to the differentiations with respect 
to the d’s and the g’s. Thus together with (iii) we see that the equations 
of Method II are those of Method I after the é’s have been eliminated, 
namely equations (6) and (7) of Method I. 

Hence we have shown that when the repeatability r is assumed 
known, the two methods are equivalent. That is, maximizing the 
likelihood with respect to u, the d,’s, and the g,’s (as in Method II), 
gives the same equations as result from eliminating the ¢;,’s from the 
equations obtained by maximizing the joint distribution function of the 
y’s and c’s, with respect to u, the d,’s, the g,’s, and the c¢,,’s (as in Method 
I). Thus when r is assumed known, the equations of Method II are 
simply those of Method I with the ¢é’s eliminated; and hence the two 
methods give the same estimates of estimable linear functions of the 
fixed effects u, d, , and g, . Furthermore, since it has been proved in 
Method II that these estimates are unbiased by selection, such un- 
biasedness also applies to Method I. 


SUMMARY 


Two methods for the separation of genetic and environmental time 
trends are presented and compared. The reason the classical least 
squares approach to this problem yields biased estimates is described in 
standard statistical terminology. The first method is illustrated with 
a small numerical example and a partial examination of a set of data is 
used to illustrate some of the assumptions and the estimation of param- 
eters in Method II. 

The methodology is presented and discussed in terms of dairy records. 
However, the same technique would apply to many other cases, often 
without the complication of age-correction factors. 
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EXPERIMENTAL DESIGN IN THE EVALUATION OF 
GENETIC PARAMETERS 


ALAN ROBERTSON* 


Institute of Animal Genetics 
Edinburgh, Scotland 


Rather surprisingly, there appears to be no discussion in the liter- 
ature of the best design for the estimation of intraclass correlations and, 
consequently, in the context of quantitative genetics, of heritabilities. 
Put rather more specifically, if we have facilities for the measurement 
of a certain number of objects with a hierarchical classification, what is 
the most efficient group size for the estimation of intra-group corre- 
lations? We shall be dealing here with the situation in which, in any 
experiment, all groups are of the same size. As more work is now being 
done on the quantitative genetics of laboratory animals, in which the 
family structure is rather more under the experimenter’s control than 
in farm animals, it seemed valuable to put these results on paper. 


Single Classification 


In the case of a single classification, e.g. N families each of n half- 
sibs, the solution is exceedingly simple. We then make use of Fisher’s 
{1941] formula of the sampling variance of the intraclass correlation t 


V(é) = 2[1 + (x — — — — 1) 


on which we must impose the restriction that Nn shall be equal to a 


constant (7', the total population number). If m is large, this can 
be written 


Vii) 20. — + nt)?/Tn 


which, considered as a function of n, has a minimum when nt = 1. 


The optimum group size is then the reciprocal of the intraclass correla- 
tion. At the optimum, we have 


Vii) 801 — 
8C/N. 
*Member of Staff of Agricultural Research Council, Great Britain. 
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Thus, with the optimum group size, the sampling coefficient of variation 
of t will be roughly /8/N. Hed we the same number of infinitely 
large groups the coefficient of variation would be exactly half this, 
V2/N. 

A somewhat more general formula can be obtained, of relevance 
when T is small (or when the analysis is done within a superclassification 
reducing the effective value of 7). With the restriction Nn = K/t, we 
write the formula for the sampling variance 


V(é) 20. + nt)?/(N — 1)n’. 


This proves to have a minimum when VN = K + 2and nt = K/(K + 2). 
The optimal value of n is then somewhat reduced if the effective value 
of T is small. 

In most cases, we have a little a priori knowledge of the magnitude 
of t. What if this is lacking? In genetic studies, we are mostly con- 
cerned with a range of ¢ from 0.01 to 0.1 (implying, in the case of half- 
sibs, heritability values from 0.04 to 0.4). Fig. 1 shows the plot V(é) 


Q 
> 
t =005 
t =002 
° t =OO! 
° 10 20 40 80 
SIRE FAMILY SIZE 
FIGURE 1 


THE SAMPLING VARIANCE OF THE INTRACLASS CORRELATION COEFFICIENT, ¢, 
FoR A Torta. oF 1000 InpIvinvALs RED. 


against n for JT = 1000 and four values of t. Tie curves show the ex- 
pected minimum when nt = | and their shape suxgests that, in the 
absence of other information than that ¢ probably ies in this range, a 
value of n from 20 to 30 would lead to the least loss of information. It 
will be seen that the curves rise steeply for low values of n and these 
should be avoided if at all possible. 
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It should be noted that, when n is large and ¢ is small, the distri- 
bution of ¢ will be extremely skew, as the lower bound is — 1/(n — 1). 
The variance estimate would have to be used with caution in this case. 


The Simultaneous Measurement of Sire and Dam Components 


In fact, the structure is often not simple but is a hierarchical or 
“nested” one in the sense that each sire group is made up of several 
dam groups and that two components (and correspondingly two esti- 
mates of heritability) can be obtained. Using the symbols s, d, n to 
represent the number of sires, dams per sire, and offspring per dam 
respectively, the analysis of variance is as follows, with t, , t as the sire 
and dam intraclass correlations, and o7; as the total variance. 


Source d.f. Expected Mean Square 


Between sires s—1 (1 — th — + nt, + 
Between dams 

within sires | s(d — 1) of (1 — th — be + nk) 
Within dams _ | sd(n — 1) oF ( 


Following the example of Osborne and Patterson [1952], we may 
make use of the fact that the three mean squares are independent and 
that t, , t2 can be expressed as linear combinations of them. Omitting 
a lot of tedious algebra and putting t; = t, = t in the expectations, we 
have 


and 
V(b) = — nla + 
(s — 1) dn 
ald — (d 


2 
sd'(d — 1)n? [1 + — 


al + (n — 


+ (1 — 2%). 


sdn?(n — 1) 
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Dealing first with V(é,) and imposing the condition that sdn = T, 
we see that the last term being in ¢ and having a large denominator 
can be ignored and that the first term is greater than the second, 
especially if d is large. Comparing the first term with that in the simple 
case discussed earlier, we see that V(é,) will have a minimum with 


respect to sire family size, nd, when n (d + 1) t & 1, similar to that 
derived for the simple case. Accepting this as giving the optimum 
value for nd, we find that the second term now determines the best 
value for n. It is, in fact,n = 1 and dt = 1. Thus the best estimate 
of the sire correlation is obtained when all progeny of a sire have dif- 
ferent dams and the family size is that given in the earlier section. If 
this is not acceptable (and here the estimation of t, comes into account), 
there appears to be a considerable improvement in going from d = 2 
to d = 3 and then the return from a further increase drops off (see Fig. 
2 for an example). 

Turning now to the expression for V(f,), the first term is negligible, 
and the second is greater than the third, especially if n is large. Con- 
sidering, firstly, the problem as merely one of minimising V (é,), ignoring 
the value of V(é,), the formula V(/,) would suggest that nt should equal 
unity. But our minimum for V(é;) was that ndt should be close to 
unity, so that the two are incompatible. We must then consider the 
value of V(é,) for fixed values of nd, with particular emphasis on the 
region around ndt = 1. 

We are then faced with an analysis within small sub-groupings 
which was dealt with earlier. It was then shown that, in the present 
terminology, if ndt = K, the optimal values of d and n are K 4+ 2 and 
K/t(K + 2) respectively. Thus if the sire families are much bigger 
than optimum for the estimation of ¢, , the optimal dam family size is 
given by nt = 1. But if the sire family size is in the region of the 
optimum for ¢, , i.e. ndt 1, then the optimum number of dams per sire 
for the best estimation of t, will be 3. 

We cannot then devise an optimum structure for the estimation 
of both ¢, and t, . Suppose, however, we wish to compare /, and é, 
and therefore wish them to have the same sampling variance. Exami- 
nation of the formulae with this condition suggests that the optimum 
lies with d between 4 and 10 and with n equal to 1/t(d + 1)!. The 
optimum sire family size would then be in the region of 2/t to 3/t. 

Figs. 2 and 3 have been drawn to act as an example with T = 1200 
and t = 0.05. For any increase in t, the abscissae of the curves would be 


proportionately reduced and the ordinates proportionately increased. 
It may happen that ¢, is inflated for non-genetic reasons such as 
pre- and post-natal maternal environment and subsequent common 
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SIRE FAMILY SIZE 


FIGURE 2 


THE SAMPLING VARIANCE OF THE SrrE INTRACLASS CORRELATION COEFFICIENT, 
wiTH 1200 InpIvinuats Measurep AND h? = 0.20. d = NumsBer or Dams 
PER SIRE. 


environment of members of a full-sib group. In such a case, the estimate 
of heritability from ft, would be valueless and the variance between 
full-sib groups is the main source of error in the estimation of t, . The 
number in each full-sib group, n, becomes of less importance than d, 
the number of full-sib groups per sire. The optimum value of d in the 
estimation of t, is then very roughly given by d = t/t, . 

The general rules then appear to be: 


(i) when the magnitude of the heritability is known and the expected 
intraclass correlation t equals h?/4 


(a) for a half-sib analysis, the optimum family size is given by 
nt = 1, 

(b) for a sire and dam analysis in which it is desired to get equal 
information on the two correlations, the dam family size should 
be given by nt = }3 with 3 or 4 dams per sire. Even if this 
family size cannot be achieved, it still appears desirable to 
have 3 to 4 dams per sire. If the family size is below the opti- 
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40 60 
SIRE FAMILY SIZE 
FIGURE 3 


THE SAMPLING VARIANCE OF THE Dam INTRACLASS CORRELATION COEFFICIENT, 
t, WITH 1200 INpDIvipUALS MEASURED AND h? = 0.20. d = NumsBer or Dams 
PER SIRE. 


mum, the sire correlation will be estimated more accurately 
than the dam correlation and vice versa. 

(c) A structure involving small groups of 2 or 3 animals per family 
is most inefficient, as variances of the intraclass correlations 
become extremely high. 

(ii) if there is no previous evidence on the heritability, then 

(a) the optimum family size for a half-sib analysis is 20 to 30, 

(b) in a sire and dam analysis, the optimum dam family size is 
10 with 3 or 4 dams per sire. 


Discussion 


We have been here discussing the sampling variances of heritability 
estimates derived from intraclass correlations. It might reasonably 
be asked how these compare with the parent-offspring estimates. In 
the former case, we had, with optimum structure in a half-sib analysis 


V(t) 8t/T leading to V(h?) 32h2/T 
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where 7 is the total number of animals measured. In the parent- 


offspring case with N pairs, we have V(h’) = 4/N = 8/T since we must 
measure 2 animals for each pair. Then, for a given amount of measure- 
ment, the two methods have equal efficiency if h” = 0.25, with the half- 
sib method having the greater efficiency at lower heritabilities. This 
conclusion is perhaps not surprising when we recall that the parent- 
offspring method is equivalent to a full-sib analysis with n = 2, whose 
relative efficiency must increase as h’ increases (i.e. as the optimum 
family size decreases). 

Recently the author [1957] discussed the optimum structure of a 
population with a single classification from the point of view of securing 
the maximum genetic improvement by family selection in a unit of 
given size. It may be of interest to contrast the two. In maximising 
improvement, the family size is dependent mostly on the size of the 
total population and only to a small extent on the heritability. On the 
other hand, in maximising information, the optimum family size is 
independent of population size (unless this is small) but inversely 
proportional to the heritability. 

Finally, it must be noted that the optimum structure for the measure- 
ment of heritability is also the optimum for the measurement of the 
genetic correlation between two characters. The formulae for the 
variance of the latter, which will be presented in another paper, are 
very similar, considered as functions of family size, to those for the 
heritability although the variances are larger in magnitude. 


Summary 


The optimum population structure for the estimation of intraclass 
correlations and genetic parameters by analysis of variance methods 
has been discussed. The conclusions are: 


(i) family sizes of the order of 2 to 3 are extremely inefficient. 

(ii) with a single classification and correlation ¢, the optimum 
family size is in the neighborhood of 1/t. 

(iii) with a double classification of sire and dam and equal intraclass 
correlations in the two cases 

(a) the best estimate of the sire correlation is with one 
progeny per dam and sire family size as above, 

(b) for optimum equal information on both correlations, the 
structure has 3 or 4 dams per sire and 1/2t¢ offspring per 
dam. 

(iv) for a given number of animals measured, the estimate of herit- 
ability obtained from a half-sib analysis with optimal structure 
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is more accurate than that from a parent-offspring regression 
if the heritability is less than 0.25 and vice versa. 
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A DISTRIBUTION-FREE ASYMPTOTIC METHOD 
OF ESTIMATING, TESTING, AND SETTING 
CONFIDENCE LIMITS FOR HERITABILITY* 


LORRAINE SCHWARTZ AND STANLEY WEARDEN 


Kansas State University 
Manhattan, Kansas, U.S.A. 


Introduction 


The genetic parameter heritability indicates the extent to which 
phenotypic variation is controlled by gene action. Its detection is 
often one of the chief purposes of a genetic experiment. Many different 
procedures have been used to estimate heritability when the observable 
random variables can be assumed to be normally distributed. For 
example, when a linear regression model is used, the regression coefficient, 
or slope of the regression line, is taken to be a measure of heritability 
and normal theory is used to test hypotheses about this parameter and 
to find confidence regions for it. However, there are certain character- 
istics of interest to the geneticist which are known to be distributed 
in a non-normal fashion; for example, the social order of fowl has a 
uniform distribution. In this paper, the expectation of a rank order 
statistic closely related to the Mann-Whitney U statistic is suggested 
as a measure of heritability when the normal assumption is not appropri- 
ate and in particular when rank within a group is a meaningful measure 
of the trait under consideration. 


Heritability 


For experiments in which selection is conducted in opposite direc- 
tions for the same trait, there is a distribution-free method for estimating 
heritability. The result may be described as the ratio of change per 
generation to the average selection differential, or 


Ao 
H = Ap’ (1) 


a constant f times the ratio of the difference between the arithmetic 


*Contribution No. 39, Statistical Laboratory, Kansas Agricultural Experiment Station, Man- 
hattan. 
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means of the two groups of progeny (Ao) to the average selection 
differential in the parental generation (Ap). The constant f is unity 
if selection is performed on both parents. If selection is made with 
respect to only one of the parents, the ratio estimates only one half 
of the hereditary effect and must be doubled to give an unbiased (in 
the genetic sense) estimate of heritability. 

However, when it is desired to test hypotheses or find confidence 
intervals for heritability [defined here by E(H)], then the distribution 
of H must be known. Hence, it is suggested that the actual observations 
be transformed to rank order values when such ranks still classify the 
data reliably. The rank order analog of (1), denoted below by H, , is 
shown to be a function of the Mann-Whitney U statistic, and properties 
of this statistic are employed for the objectives mentioned above. 


The Mann-Whitney U Statistic 


Suppose the random variables X, , Yn, are 
completely independent and have continuous cumulative distribution 
functions F(s) = P{X; < s} and G(s) = P{Y; < s}, wherez = 1, 
5m, = 1, , m2 and F and G are unknown. Let the X’s and 
Y’s be arranged in order and let U count the number of times a Y 
variable precedes an X variable. The statistic 7’ which is the sum of 
the ranks of the Y’s among the ordered sequence of X’s and Y’s was 
proposed by Wilcoxon [1945] as a test criterion for testing the hypoth- 
esis that F and G are identical [F(s) + G(s)] against all alternatives 
of the form G(s) S F(s) for all real s with strict inequality for some s 
(i.e. that the Y’s are stochastically larger than the X’s). Wilcoxon’s 
test, which is to reject the hypothesis for “large” values of 7, was 
proposed for the case when n, = n.. Later, Mann and Whitney |1947] 
proposed essentially the same test for arbitrary n, and n, and found 
the distribution, in terms of a recursion formula, for the test statistic 
U which is related to T by 


U = nn, + _ T. 

The test in the final form is, then, to reject the hypothesis of equality 
of distributions when the observed value of U is ‘too small,” the 
critical values of U being determined by the level of significance. 


Whatever the relationship of F to G may be, the expectation and 
variance of U are 


E(U) = mnp (2) 
= mn(n, — + (m — + p(l — D), (3) 
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where p = P{Y; < X;},¢ = P{X,, Xi < Y;} — (1+ p)’, andy’ = 
P{Y; , ¥, < X,} — p’, as shown by Mann and Whitney [1947]; the 
notation is that of Birnbaum and Klose [1957]. 

If the distributions F and G are identical, then p = } andg’ = y’ = 
1/12. Thus, under the hypothesis of equality of distributions, 


E(U) = mn,/2 and = mn(n, + + 1)/12. (4) 


For this special situation, the distribution of U for small values of 
nm, and nz may be found in tables calculated by Mann and Whitney 
[1947], while for larger values, it is closely approximated by the normal 
distribution. 

In general, the expectation and variance of U depend on three 
parameters, p, ¢’, and y’. However, van Dantzig [1951] and Birnbaum 
and Klose [1957] have obtained strict bounds for the variance, as given 
in (3), which depend only on p, and hence only on E(U). These are 


oy S nynzp(l — p) max , nz) (5) 
and 
nul r) 120 — if < 2r (6) 
T= 


nad — 1)(m. — Yr — + — 2)r? + pil — »|, 


where » = min (nm, , m2), vy = max (m , 2), and r = min (p, 1 — p). 
They also provided even sharper bounds in the case where Y is known 
to be stochastically larger than X. However, these will not be used in 
this paper. 

Lehmann [1950] proved that 


p = U/nn, (7) 


is the uniformly minimum variance unbiased estimate of p. Other 
results of Lehman [1951] include a proof of the asymptotic normality 
of Vn. (p — p), no matter what F and G may be, when two conditions 
are satisfied: i) n,/n, remains constant as n, , N2 — ©, and ii) p # 0, 1. 
The approach to normality is most rapid when p is near } and becomes 
less rapid the farther p deviates from 3. 

Since, for large sample sizes, the distribution of U is approximately 
normal, it is suggested [Birnbaum, 1956] that the normal approximation 
be used with oy replaced by its upper bound in (5). The effect of sub- 
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stitution of the upper bound for oy is to “widen”’ the confidence interval 
obtained. More precisely, if 1 — a@ is the assigned level of confidence 
and M is the square root of the upper bound for «7 , then 


U — E(U) < U — E(U) 
Cu 


For large n, and n, , u. may be determined from tables of the normal 
distribution with zero expectation and unit variance. 
The inequality in the first bracket in (8) may be written as 


sub=1-a. (8) 


U — 
— p) max (m ,%) = 


—u. 3 = 


or pr Sp 
where the lower bound 
U 2 3 
+a-4(2 +a) 4( 
sis 2(1 + a) 
and the upper bound 


2U u\ 
Pu = a + (l a) (10) 


with a = max (n, , 


Relationship of the Statistic H, to the Mann-Whitney U Statistic 


If selection is performed on the parental generation in order to divide 
it into the two groups, one of which can be rated as being above the 
median rank for the trait under consideration and the other as below 
median rank for the same trait, then the estimator (H,) of heritability 
which depends on rank is 


(11) 
In (11), (@2 — a,) is the difference in mean rank between the high 
parental group and the low, while (b, — b,) is the difference in the mean 
ranks of the progeny of the high parental group and those of the low. 
The statistic H, is seen to be a function of the ranks, among all the 
progeny, of those whose parents had rank above the median rank (the 
high parental group), so that (11) may be written in terms of the Mann- 
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Whitney U statistic. If N is the number of parents, the difference 
between the mean ranks of the high and the low parental groups is 


: k-2 (12) 


N k=(N/2)+1 k=1 


N 
= 


Let R; be the rank, among all the progeny, of the ith offspring from 
the low parental group, and S; be the rank, among all the progeny, of 
the 7th offspring from the high parental group. Then 


2n, 


(13) 


where 7, is the number of offspring from the low parental group, 
and mn, is the number of offspring from the high parental group. 
Since >>", S; , which is the Wilcoxon 7, is equal to nn. + 
[n2(m2 + 1)/2] — U, the estimator 


(14) 


2 
Point Estimate and Confidence Interval for Heritability 


Heritability will be assumed to be measured by E(H,) = @. Since 
by (14), @ = 2f (nm: + n2)(} — p)/N where p is defined as in (2), the 
point estimate for heritability is 


1). 


The limits of the confidence interval for @ are obtained by (9), (10), 
and (14). The lower limit is 


0, = (3 po) (16) 
and the upper limit is 


Test of the Hypothesis that there is no Heritability 


The test of the hypothesis (H,) that @ is zero against the specific 
alternative hypothesis that @ is greater than zero will be precisely the 
test of the hypothesis that p = 3 against the alternatives p < 3. By 
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means of the correspondence of tests to confidence sets, a one-sided 
test for p = 4, against p < 4, may be obtained by taking as the rejection 
region the set of values of U for which py < 3, or equivalently when 


— Ug MAX (nN, , No). 
2 4 


The test is, then, to reject Hy whenever 
+ No) [max (m1_, Ne) 
H,>f (18) 


Table 1 represents the kind of information the geneticist might 
obtain on the social order within a flock of hens. It is desired to de- 
termine the extent to which a dam’s social position is inherited by 
her daughter(s). Figuratively, the maternal flock can be divided into 
two groups, a “high” group within which each member dominates over 
one-half of her contemporaries, and a “low” group in which no member 
dominates more than one-half of the other birds in the maternal flock. 

The actual number of dams in Table 1 is thirty-four, but the effective 
number, because of differential reproductive rates, is forty-eight. For 
example, a dam such as A,, , which dominates fifty per cent of the 
flock and produces two daughters, would be expected to have the same 
effect on the next generation as two dams, each of which dominates 
fifty per cent of the flock and produces only one daughter. Thus, the 
value of N in this example will be forty-eight. 

Using the information in Table 1, the calculated value of U is 


Example 


(25)(23) + 


670 = 181. 


Utilizing this value, the estimate of heritability may be calculated. 
However, information has been obtained only on the females. Assuming 
that the same cock (or socially equivalent cocks) sired all the progeny, 
the ratio in (11) would estimate only the genetic effect of the hen, or 
} 6; consequently f is equal to 2. Using (7), 


p = (181) = 


and by (15) the estimate of heritability 


6 = (2)(2) ——— (.500 — .315) = .740. 


BS 
ae 
if 
- 
ae 
on 
ae 
315 


234 


BIOMETRICS, JUNE 1959 


In order to test whether this estimate is significantly different from 
zero, let the rejection level (a) be .10; thus, uv, will be + 1.28. According 
to (18), the test of significance is to reject the hypothesis that @ = 0 
when H, is greater than 


1.28(25 + 23) 
48 (25)(23) 


Since the estimate (6 = .740) is greater than .534, it is significantly 
greater than zero at the .10 level of probability. 

If one wishes to place 2-sided confidence limits on 6, the upper and 
lower limits of p must first be obtained. By (9) and (10), 


_ 2(.815) + .071 — V(.630 + .071)* — 4(.099)(1 + .071) 


(2) 


= .534. 


(1 + .071) 
and 
2(.315) + 071 + V(.630 + .071)* — 4(.099)_ + .071) _ 
2(1 + .071) 
Thus the 80 per cent (or greater) confidence limits for @ will be 


6, = 2(2) ——>— (.500 — .448) = 


= 2(2) 


(.500 — .206) = 1.176. 


It is impossible, of course, for more than 100 per cent of the variability 
in a trait to be due to genetic causes; thus, the upper limit of @ is in- 
ordinately large from a genetic standpoint. However, it is to be recalled 
that an upper bound for of is used to determine the points py and p, ; 
therefore, these are at least 80 per cent confidence limits. Furthermore, 
information only on the female parent is included in this example; thus, 
only the dam’s genetic effect, or } @, is estimated. Since any error in 
the estimate is doubled, the confidence interval will be larger than one 
based on an estimate which utilizes information on both parents of 
each individual. 


Summary 


A method is proposed for the estimation of heritability under the 
situation where the experimenter may not be able to assume a normal 
distribution for the variate being studied. It requires that the indi- 
viduals in the parental and the F, generation can be ranked on the basis 
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of the character under study. The heritability of rank can then be 
estimated, the estimate tested for significance, and confidence limits 
placed on the parameter. 
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THE REGRESSION ANALYSIS OF CAUSAL PATHS* 


Matcoum E. TurNER** AND CHARLES D. STEVENS 


The Kettering Laboratory in the Department of Preventive Medicine and 
Industrial Health, College of Medicine, University of Cincinnati, 
Cincinnati 19, Ohio, U. S. A. 


I. Introduction 


The purpose of this presentation is to acquaint biologists and bio- 
metricians with an important tool, path analysis. This tool can be 
of help in dealing with complex causal networks. These often, though 
not always, prove amenable to common regression technics. Path 
analysis, originated by Sewall Wright [1918], is a convenient approach 
to regression problems involving two or more regression equations. 
For those unskilled in statistics, path analysis provides one method 
of depicting regression problems by simple diagram. The path diagram, 
commonly representing the flow of cause and effect, often permits 
one to write estimators of parameters immediately upon inspection. 
Path analysis thus facilitates the process of abstraction for both math- 
ematician and biologist. The analytic process is here explained, two 
computational algorithms (rules-of-thumb) are given, and an example 
involving feedback is detailed. Inclusion of feedback, and thus homeo- 
stasis, is an important feature of this presentation. 

Since Wright’s early work [1918, 1921, 1924, 1934, and others] the 
treatment of multiple equations has been extensively developed in 
econometrics (see especially Hood and Koopmans, [1953]) but generally 
without use of the standardized regression coefficients used by Wright 
or of the path diagrams and algorithms which characterize Wright’s 
technic. Wright himself [1921] used unstandardized coefficients and 
the term path regression, but in general [1954] has favored the standard- 
ized form. Tukey [1954] in a critical review pointed out advantages in 
working with unstandardized regression coefficients. -tecently Kemp- 
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thorne [1957] emphasized the use of explicitly stated equations instead 
of reliance upon algorithms. The explicit statement of equations will 
be favored here, but with the use of algorithms for manipulating these 
equations in simple cases. 

For the best exposition of Wright’s views, his 1934 and 1954 papers 
are to be consulted. Li [1956] gave an expository review of Wright’s 
work especially as it concerns population genetics. 

Those without mathematical inclination may wish to omit Section 
2 which follows. It deals with some of the assumptions underlying the 
analytic technic. 


2. The Structure of a System 


Consider first a situation where p + q quantities m , m , --* , 7, 
and & , & , °:: , & are thought of as measurable, and where p causal 
relations determine the 7’s, absolutely and uniquely by the é’s. Let 
the equations which express these causal interrelations be 


7:6. 
m2 = Fig. , & 


These equations will be referred to as the structural equations. Each 
variable » is assumed to be absolutely and uniquely determined (or 
“caused”’) by the set of é-variables. The parameters, denoted by a, 
may be thought of as physical, chemical, biological, or psychological 
constants. The algebraic forms of the equations will be determined 
by the physical, chemical, biological, or psychological theory, re- 
spectively, or in the absence of adequate theory simple empirical 
approximations, such as polynomials, may be employed. 

In the above we have dealt solely with variables not subject to 
random disturbances. Now we introduce the fandom element. Random 
variation enters into the é’s as measurement error but may enter as 
response error as well as measurement error in the y’s. By response 
error is meant such things as, e.g., genetic and physiological variation. 
The random variation may affect the measured quantities in an additive, 
multiplicative, exponential, or other fashion; however, we will consider 
only additive errors. Often, errors can be made additive by suitable 
transformation but this may result in a less convenient algebraic form. 
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Suppose now that the quantities which can be measured are not the 
y's and é’s, but rather y’s and x’s which satisfy the error equations: 


= + 6, 
Yo = Ne + & (2) 


Y=nt+e t 6, 


where the e’s and 4’s are distributed independently of the n’s and ?’s. 
Note that the y’s and z’s are unbiased estimates of the y’s and é’s as 
a consequence of equations (2) if in addition the errors are all assumed 
to have means of zero. Also note that, while the ¢’s and 7’s are not 
correlated with errors, each observable quantity (x or y) is correlated 
with its own error. 

By solving the structural equations (1) and the error equations (2) 
simultaneously to eliminate the n and £ variables we obtain the model 


equations: 


The statistical problem now is to estimate the a-parameters and if 
possible to place confidence limits about the estimated parameters. 
The statistical procedures to be employed depend upon what assump- 
tions can be made about the distributions of the errors, the nature of 
their interrelations, and the kinds of functions. 

It is fortunate that in biological applications we may often neglect 
the errors in x because the response error, which is contained in y but 
not in z, is usually of much larger magnitude than the measurement 
error. When this is the case, such neglect introduces very little bias 
into the estimates of the parameters. In linear systems there is a 
special situation of interest. Berkson [1950] pointed out that when the 
x variables are controlled, that is, when the x’s correspond to the values 
intended but not to the values actually obtained, neglecting the x error 
produces no bias in the estimates. This is a consequence of absence 
of correlation between intended values and errors of preparation. In 
the succeeding sections of this paper we will neglect the error in x 
entirely, justifying our action on the basis of one or the other of the 
above arguments. In making applications of the procedures to be 
shown, one must be careful to ascertain if this action is valid. In some 
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cases the analyst will not be primarily concerned with estimating the 
constants in the structural equations but instead will wish to predict 
y’s for given observed z’s. In these cases the “best” predictor is obtained 
by ignoring the error in 2. 

In addition to assuming that the errors in x are negligible, we will 
also assume that the errors in y are independently and normally dis- 
tributed with a mean of zero, and that the errors are uncorrelated from 
one y to another and are uncorrelated with the x’s. Finally, in the 
major portion of this paper we will assume that all relations are linear 
in functions of the x’s but not necessarily in the a’s. When these 
assumptions are met, it is often possible to obtain rather simple esti- 
mators in quite complex causal networks. If the assumption of 
negligible error in x breaks down we are in trouble unless information 
concerning ratios of the variances of the y-errors to the variances of 
the z-errors is known. See Deming [1943] for general methods of 
handling such situations. Non-linear structural relations can be treated 
by iterative procedures and a few remarks will be made later on about 
such applications. Again see Deming. It may not always be realistic 
to assume non-correlation between the errors in different y-variates. 
When this is the case estimation procedures will not always be affected, 
but due to limitation of space only the simpler situations of non-correl- 
ation between errors in y, of structural equations linear in functions of 
x’s, and of negligible errors in x will be considered in any detail. The 
fact that the x’s may be functions of other 2’s, as in the examples of 
Tukey [1954], sections 11 and 20, or as polynomial expressions or as 
various transforms of original variables, makes linear methods available 
to a wide variety of initially non-linear situations. Note that no 
assumptions whatever are made about the distributions of the ¢’s and 
n’s. It is partly due to this fact that we regard this problem as one of 
regression, not one of correlation. As a special case either the ¢’s or 
ns may be randomly distributed in some particular fashion. More 
commonly, the values of the é’s to be studied are chosen by the ex- 
perimenter. In either case, the fact of how the é’s and 7’s were selected 
is irrelevant to the present analysis. 


3. Path regression formulation 


We will suppose, as in the previous section, that we have a closed 
causal system consisting of q primary factors or causes (é’s) and p 
resultant effects (n’s). These p + q variables may then be considered 
to be associated one with another by a network of causal pathways. It 
is convenient to diagram this network by the device due to Wright of 
representing causal pathways by single-headed arrows connecting 
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cause (tail) to corresponding effect (head). Because among any three 
variables of which at least one is a é there are six conceivable diagrams, 
among four variables sixty-five diagrams, and among five variables 
several hundred diagrams, the selection of the most meaningful and 
promising diagram will be based upon the judgment of the investigator. 

As an example, consider a system in which two primary causes 
jointly determine an effect which, in turn, determines a still different 
effect. The path diagram for this system is: 


The primary factors (¢’s) are indexed by letters of the alphabet and 
the resultant effects (n’s) by arabic numerals, in this publication. 

The character of the primary factors as “causes” or “non-response 
variables” is represented by the restriction that an arrow can never 
point toward a ¢. There are no other restrictions whatever on the 
positioning of the arrows. In fact two arrows pointing in opposite 
directions between the same two variables is even permissible and may 
be given an interpretation, so long as one of the variables is not a &. 
In causal regression systems the arrows of the path diagram indicate 
passage of time; in other regression systems, such as calibration diagrams 
and prediction equations, the arrows do not necessarily represent 
passage of time. There is an equivalence between the methodologies 
appropriate to these two types of problems. The method of presen- 
tation in this article is especially suited to causal systems. 

The path diagram may be interpreted in terms of the structural 
equations. The variable at the head of one or more arrows is interpreted 
as being a function of just those variables at the tails of these same arrows. 
It is quite possible for these functions to have any form whatever; 
however, the numerical estimation of parameters in the structural 
equations is usually difficult in any but the linear case. As mentioned 
before, the discussion in this paper for the most part will be limited to 
the case of linear causal relations. 

Assuming linearity then, we can upon inspection write down the 
structural equations which are implied by any path diagram. For the 
previous example we have: 


= + at. + 
N2 = a + 2171 


(4) 


| a2, 
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where the a’s containing a double subscript are referred to as path 
coefficients* and have been inserted in the diagram. Note that the 
first subscript denotes the variable at the head of the arrow and the 
second denotes the variable at the tail. The single subscripted a’s are 
the intercepts where the subscript corresponds to that of the left member 
of the equation. There are always precisely p structural equations, one 
equation for each effect. 

As in Section 2 we think of the ¢’s and 7’s as being ‘‘true” variables 
of which, in general, we do not have any direct measurement. Generally 
we will have obtained values of corresponding y’s and x’s which are 
related to the true variables by a set of error equations. The following 
conditions are assumed to hold for all subsequent development in this 
paper: 

(1) x; = &; for all 7 (errors in x are negligible). 

(2) y; = n¢ + e; for all z (errors in y are additive). 

Now, by combining condition (2), that is, the error equations, with 
the structural equations we obtain the regression equations described 
in the preceding section. For the previous example we obtain by direct 
substitution 


Yo = A + an(Y¥ — &) + & 


as our model equations. 


(5) 


4. Assumptions and principles underlying estimation 


In addition to assumptions (1) and (2) of the previous section, 
we assume: 


(3) ¢; are normally distributed with mean zero and variance for all 7- 

(4) e; are uncorrelated one with another for fixed 7 and from variable 
to variable. 

(5) e; are uncorrelated with 2; for all 7 and }. 


These differ slightly from the conditions assumed by Tukey [1954]. 
It is possible to elect any one of several estimation principles in 
order to obtain estimators for the path coefficients and intercepts. We 
prefer to use those estimators which maximize the likelihood both 
because of their desirable asymptotic properties and because the esti- 
mators are invariant under transformation. In an important subset 
of path schemata the model equations can be reparameterized such 


*Wright [1921] termed these ‘‘path regression” coefficients to distinguish the coefficients of the 
type discussed here from the standardized coefficients of Wright. We will use the simpler term where 
the distinction is not needed. 
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that the scheme can be separated into distinct regression equations— 
no two of which contain the same parameters or error terms. When 
these regression equations are linear, and because of assumptions (1) 
to (5), the Gauss-Markov theorem applies and we have the result that 
the maximum likelihood estimators are equivalent to least squares 
estimators (see Kempthorne [1952] or Rao [1952]). In fact these esti- 
mators are also equivalent to the ‘“‘moment”’ estimators found in some 
older books on numerical methods. In other cases, where the Gauss- 
Markov theorem does not apply, discussion will revolve about the 
maximum likelihood estimators. 


5. Estimation in some simple cases 


Any causal network without closed loops or cycles can be thought 
of as made up of three simple kinds of relations. These are: 

(1) Simple regression (ordinary multiple regression). 

(2) Simultaneous regression. 

(3) Chain regression. 
Once the estimation problem has been discussed for these three types, 
the general scheme for treating cycle-free networks of any desired 
complexity is easily found. Networks which contain causal cycles, 
i.e., systems with feedback or homeostasis, require more careful con- 
sideration. 

Simple regression is ordinary straight-line or multiple linear re- 
gression. For example, consider the path diagram 


The structural equation is: 


Mm = + + (6) 


and the model equation is by substitution from the error equation 
Yi =a + +4. (7) 


This is, of course, the multiple linear regression model with two in- 
dependent variables and the path coefficients are identical with the 
ordinary partial regression coefficients. In symbols 


ain = 


fe 
+ }, 
b 


REGRESSION ANALYSIS OF CAUSAL PATHS 243 


where 8,,., is the partial regression coefficient of y, on x, holding z, 
constant and 8,,., is the partial regression coefficient of y, on x, holding 
x, constant. Hence, if lower case Latin letters are used to represent the 
maximum likelihood estimates, then the desired estimates of the un- 
known parameters a;, , a, , and a, are 


Qe = Dias 


Qin = 


= Fi — Ayla — 


The estimates b,,., and b,,., are obtained in the usual way and the 
procedure will not be duplicated here; e.g., see Snedecor [1956] or 
Mather [1946]. It will be seen that many of the causal networks 
satisfying the five conditions of Sections 3 and 4 will have solutions in 
terms of ordinary total and partial regression coefficients. 

The case of simultaneous regressions is similarly straightforward. 
It is the case of two or more effects which are determined by the same 
primary factor. As an example consider the path diagram 


The structural equations are: 


m = Q + 


and the model equations are 


+ + & 
Yo + + €> . 


" (9) 


Due to the assumptions of Sections 3 and 4 the errors are independently 
distributed and in effect the maximum likelihood or least squares 
estimators can be found separately for each equation. But since each 
equation is just the ordinary linear model, estimators are 


= bi = Doo 


a, = Yi a2 Yo 
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where b,, is the ordinary estimated regression coefficient of y, on x, 
and b., is the ordinary regression coefficient of y. on 2, . 

The third basic type is less familiar. If a primary factor determines 
an effect and if this effect in turn determines still another effect we speak 
of the network as being a chain regression. The path diagram of the 
simplest example is 


The structural equations are: 


+ 
N2 = + aim 


which give rise to the following modei equations 


= + + €; (11) 


Yo = + an(y: — &) te. 


The errors ¢, are common to both equations, so the parameters in the 
two equations cannot be separately estimated as in the previous case. 
However, if 7, is eliminated from the second structural equation by 
substitution of the right member of the first structural equation we 
get a modified pair of equations which are referred to as the reduced 
structural equations. In this case we have 


(12) 


Reduced regression equations are then found: 


Yr = + + & 
Yo + + + & 


(13) 


These are now independent and the solutions may be found in terms of 
the ordinary regression coefficients. The second of these is now linear 
in new parameters (enclosed in parentheses) and there is no common 
error term so that separate estimation is possible. The maximum 
likelihood estimators for line 1 are easily found as before to be 


Nia Dia 


€ Gia aa 
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and since (a, 21) is the slope of line 2 we have 


= dog or Qo, = Doa/Dia 


a, + = — = — An 


The final estimates are maximum likelihood estimates due to the 
property of invariance of maximum likelihood estimators under trans- 
formation. 

It can now be seen that non-cyclical combinations of the three 
kinds of causal networks can be treated by first eliminating all y’s from 
the right hand members of the structural equations by substitution 
and then equating the resulting compound coefficients to the corre- 
sponding estimates of the appropriate ordinary and partial regression 
coefficients. This will give a set of simultaneous non-linear estimation 
equations which can sometimes be solved for the estimates of the path 
regression coefficients. The case of cyclical or feedback regression can 
be similarly handled, but consideration of this case will be delayed 
until after consideration of a slightly more complex example. 


6. Example illustrating the process of estimation 


The simplest case illustrating combination of simple, simultaneous, 
and chain regressions is diagrammed as follows: 


Aia 
> 
a 


We will now obtain the estimators as functions of the ordinary partial 
regression coefficients. We proceed stepwise in the following manner: 
(1) Write down the structural equations from the path diagram. 


There are two of these since p = 2. We have 


m =a + aka + 
Q2 + Aim + - 


(14) 


Ne 


(2) Eliminate », from the right member of the second equation 
above by substitution of the right member of the first equation. 
This gives the reduced structural equations: 


Mm = + + 


(15) 
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(3) Substititute x’s for ¢’s and y’s for y’s and attach the additive 
errors to give the reduced regression equations: 


Yr = & + + + 
= (a + + (21014) + (a, + 01450121) + . 


(16) 


(4) The estimating equations can then 
inspection: 


be written down by 


= Div. bos. 


+ = 


=f — — + 4,02, = — — 


These are then solved for the estimates of the intercepts and path 
coefficients. 


7. Rules for writing down the estimators directly from the path diagram 
when no feedback is present 


It would be convenient to be able to avoid the algebra of step (2) 
in the above example. This is quite simple to do if no cycles of causation 
or feedback exist in the system. It can be seen that, for each effect, 
there are exactly as many partial or simple regression coefficients 
available as estimators as there are primary factors which ultimately 
affect the particular effect. Each of these partial or simple regression 
coefficients may be termed total path regressions. Now, if a compound 
path regression is the product of the elementary path coefficients along 
any one path from a particular primary factor to a particular effect, 
then we can state: 


Rule 1. A total path regression between a primary factor and an effect 
is the sum of the compound path regressions connecting the 
primary factor and the effect. 


As an example of the above rule consider the problem of the previous 
section. Let us find the estimator associated with the partial regression 
coefficient of 7. on £ holding £, constant (8.,..). This coefficient repre- 
sents the total path regression between £ and 7, . This total path 
regression is to be set equal to the sum of the elementary paths. There 
are two such elementary paths between &, and m , one direct path with 
coefficient a, and the other through 7, with coefficient a,,a2., . Hence 
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Doy.a = G25 + 1,42, Which is one of the four estimators based on partial 
regression coefficients that was obtained before.* 

We may state, similarly, a rule for writing the estimators involving 
the intercepts. In Rule 1 we employed the partial regression co- 
efficients in the multiple regression equations which relate each effect 
n to all of its determining causes £. These partial regression coefficients 
we termed total path regressions. Analogously, we may speak of the 
total intercepts as being the intercepts of these same multiple regression 
equations. Then we have 


Rule 2. A total intercept for a particular effect n; is the sum of the 
particular intercepts a; and the products of all intercepts of 
effects determining n; by the elementary path regression con- 
necting the determining effect and n; . 


An example should make the application of Rule 2 clear. Again, 
consider the problem of Section 5. The only “‘effect’’ determining 7, is 
m, and hence the total intercept for 72 is to be set equal to a. + aaa) 
where a; is the elementary path regression between the determining 
effect », and the particular effect being considered n.. Thus we get the 
estimator obtained earlier 


+ = Jo — — 


The right member of this estimator is the usual estimate of the intercept 
of the multiple regression equation relating 7. to & and & . 


8. Identification of parameters 


In the examples of Sections 5 and 6 the parameters are said to be 


*Note that a total path regression is a total derivative of an effect with respect to a primary factor 
and that the path coefficients are partial derivatives obtained from the functional equations. Thus 
in the present example 
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just identified because of the fact that each parameter can be uniquely 
expressed as a function of the simple and partial regression coefficients. 
For this to be so it is necessary (but not sufficient) that there be the 
same number of path coefficients as there are simple and partial re- 
gressions. Although many causal schemes conform to this condition the 
majority do not. Over-identification occurs when there are more re- 
gression coefficients than there are path coefficients, under-identification 
when there are fewer. We will consider briefly in the next two sections 
what can be done (1) if the parameters are over-identified and (2) if 
they are under-identified. 


9. Over-identification 


As an example of over-identification consider the example of Section 
3. Using Rule 1 of Section 7 we get the following estimation equations 
for the path regressions: 


It is obvious that a, can be estimated either by b...,/bi.., or by 
boy.2/bis... These two estimates in general will not be the same numeri- 
cally and it is apparent that any efficient estimator would have to 
utilize both sources of information. In cases of over-identification 
such as the one above, maximum likelihood estimators may be found 
but in general they do not have an explicit form and hence require an 
iterative solution. Methods utilizing only part of the available infor- 
mation have been developed. See Hood and Koopmans [1953]. Full 
information methods may present numerical difficulties but we feel 
that in any serious study the dictum of inductive inference that “all 
available information must be considered for a valid induction” should 
not be violated. 


10. Under-identification and factor analysis 


A condition considerably more serious than that of over-identifi- 
cation is the case of under-identified parameters. This occurs when 
regression information is inadequate for individual estimation of each 
parameter. Consider the following diagram: 
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a 


13 


en 
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Maximum likelihood estimators are found by Rule 1 to be 
bis = boa = Aza + 


This presents the impossible task of solving for three quantities knowing 
only two relations. For a determinate solution additional information 
will have to be provided. For example, if it were known that a,, and 
Q2q Were in some given ratio, equal, say, then a complete solution could 
be found. Even in the absence of such knowledge, information regard- 
ing the relative magnitudes of a,, and a2, can be provided by supplying 


such restrictions as a}, + a3, = 1 or a.,a2; = 1, ete. The second of 
these possible restrictions assumes that the coefficients are of the same 
sign. The restriction a.,a2, = —1 could be used if the signs were known 


to be opposite. Other restrictions will be sensible in special cases. The 
reader is reminded that the character of the solution obtained will depend 
directly upon the character of the restrictions supplied. 

A case in point is the collection of procedures, widely used in the 
social sciences, known as factor analysis. Factor analysis may be 
thought of as the ultimate in under-identification, no regression infor- 
mation being available at all. One of the models postulated (see 


Lawley [1953]) is the system of linear equations corresponding to the 
path diagram: ‘ 


A set of observable variables whose true values are denoted by 


°** are considered to be linear functions of some common 
factors & , , &, Whereg < p. These common factors are either 
unknown or unmeasurable. The path coefficients a,, , ai, , Qpe 


are referred to as factor loadings by the factor analysts and define the 
composition of each measurable variable in terms of the unknown 
factors. The path coefficients may just as well be thought of as de- 
fining the factors in terms of the knowables and it is in this latter sense 
that the factor analyst usually regards them. In any case, the problem 
is to estimate the path cefficients or factor loadings where no infor- 
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mation is provided by regression. This means that pg a priori re- 
strictions must be supplied. As there are infinite numbers of ways of 
doing this, it is not surprising that several schools of thought have 
arisen in regard to the matter of which restrictions to use. In general, 
however, methods have been advocated which maximize the contri- 
bution of a single factor and then that of a second factor, and so on 
until the least important factor is reached. The method of principal 
components advocated by Hotelling adds to this ordering the restrictions 
that the factors are orthogonal to one another (see Holzinger and 
Harman [1941] for a survey of models and methods of factor analysis). 

The possibility should be pointed out of combining the situation 
of factor analysis, in which none of the é’s is measured, with the re- 
gression situation. Thus, we envisage a model which contains some 
measurable factors and some non-measurable ones and either or both 
might be entangled in regression chains. This model is especially 
appropriate in biology where various kinds of collateral information 
are often available. 


11. Feedback 


A causal process often involves one or more cycles of causation. 
Homeostatic mechanisms provide a variety of biometric examples. 
Consider the diagram below: 


& Zap 


What is the meaning of this diagram? Let us suppose that the variables 
& , &, m , and 7 have some particular value. Now suppose that & 
is changed by some specific amount. The effect is to change , either 
in the same direction or in the opposite, depending upon the sign of a, . 
Suppose a,, is positive and we increase —,. Then 7, is also increased. 
Since a2, is not zero, 2 will also be affected. If a2; is positive, 2 will 
also be increased. The non-nullity of a,. indicates that this change in 
n2 Will further affect 7, . In such a situation we say that there is “feed- 
back.” If ay. is of the same sign as a; then 7», will increase further 
and this will cause a still further increase in 7, . Thus, the process 
continually builds up indefinitely until there is breakdown in the system. 
This type of system, wherein all of the processes in the loop go in the 
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same direction, is said to possess positive feedback. Positive feedback 
is obviously unstable. On the other hand stability or equilibrium may 
be attained if some of the processes go in opposite directions. Then 
we say the system has “negative feedback.’’ If a,. were negative then 
m Would decrease and perhaps the process would settle down to some 
stable or equilibrium position. Relative magnitudes as well as direction 
are important in producing equilibrium. In this example control is 
provided by &, as well as ~, so that when both £, and £, are changed 
the flow of cause and effect cycles around the loop until equilibrium is 
reached, or else breakdown or oscillation occurs. 

The equilibrium is said to be “stationary” if », and 2 settle down 
to the same final values for any particular change in & or &. The 
equilibrium can only be truly stationary in the degenerate case when 
Qi, and a, are zero. However, this condition is approached if the 
product of the path coefficients is very large in absolute magnitude. A 
non-stationary equilibrium is spoken of as a “‘moving’’ equilibrium. 

When such a process as is diagrammed above is in equilibrium it is 
possible to use the methods of this paper in estimating the relevant 
constants. The simple rules of Section 6 require modification. How- 
ever, the more general method outlined in Section 5 applies. 

Wright treated the case of feedback in a manner analogous to the 
present treatment some years ago but did not publish this material 
because new algorithms would have had to be introduced. A method 
of handling feedback using auxiliary variables was, however, published. 
See Wright [1924, 1934]. The present treatment does not rely upon 
algorithms and hence cannot be objected to on Wright’s previous 
grounds. The authors feel that the avoidance of the use of auxiliary 


variables makes logical presentation simpler. The maximum likeli- 
hood estimators are 


Doa.s 
la.b 


= Doy (1 = = — — 


12. A biometrical example of feedback regression 


The classical data obtained by Haldane and Priestley, from which 
these authors inferred a mechanism of the control of depth of breathing, 
affords an interesting example of feedback in a biological system. The 


following data for Haldane himself were given in the original memoir 
[1905]. 
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TABLE 1 
HALDANE-PRIESTLEY Data ON DEPTH OF BREATHING 


% carbon % carbon dioxide Depth of respiration 
dioxide inhaled in the alveoli in cubic centimeters 


(m) 


Onn WW WD bo 
© wr 


We suggest the following representation of the causal pathways: 


¢ 714 
a 


where &, is the per cent carbon dioxide inhaled, 7, is the per cent carbon 
dioxide in the alveoli of the lungs, and 7, is the depth of respiration. 

An advantage of path analysis is that each part of the total process 
is explicitly represented without regard to other parts of the process. 
Thus, the path coefficient in one part is invariant when other parts of 
the process are changed. In the present example we may ask: “What 
is the value of a,, when a, and a. are both zero?”’ Whatever this 
value is, it must be the same for any values of a2, and a... Now, 
Qo, = Qi. = 0 is tantamount to having the lungs disconnected from the 
neural reflex arc. It is obvious that, if a certain time is allowed for 
equilibrium to be reached before observation is made, the concen- 
tration of CO, in the alveoli will be the same as the concentration in 
the inhaled air and a,, must be very nearly equal to unity. From the 
nature of path coefficients a,, must be equal to unity whatever are the 
true values of a,. and a,,. The structural equations are then 
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m = + + 12N2 


N2 = + aim 


(17) 


and by substitution the reduced regression equations are 
Yr = (a, + + — a2021) + 
Yo = (a2 + + — + & 


It is seen from (17) that the response can be represented by a line which 
is the intersection of two planes. One of these planes is parallel to the 
line 7, = ai22 and to the line 7, = &, ; the other is parallel to the &, 
axis and to the line 7, = (1/a2:)q2. Thus, the path coefficients may be 
represented as the slopes of certain projections of planes. 

The degree of approach to the condition of stationary equilibrium 
is seen to be a function of the denominator in (18), i.e., to 1 — aj2a . 
If this quantity is > 1 the equilibrium becomes nearly stationary. If 
the quantity is only slightly > 1 then there is poor compensation. If 
the quantity is < 1 then there is positive feedback and instability. 

From (18) the regression coefficient for the regression of y; on 2, 
is found to be 


(18) 


where S,, , S., , and S,, are the usual “corrected” sums of squares and 
cross-products. Hence 


* 977.1 


hs = — 0.002764. 

2a 
These estimates of the path coefficients are of opposite sign as they 
should be for equilibrium or negative feedback situations. Figure 1 
shows the estimated line of response as the intersection of the estimated 
planes corresponding to (17). 

It is, of course, possible to test first all regression coefficients for 
significance before using them in estimation equations. Lack of sig- 
nificance of one or more regression coefficients would allow one to 
anticipate wide confidence limits for certain of the path coefficients. 
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FIGURE 1 


EstTIMATED LINE OF RESPONSE AS INTERSECTION OF PLANEs FitTEp To DaTA OF 
HALDANE AND PRIESTLEY. 


The standard test for the slope of b,, yields a Student’s ¢ of 6.0 with 13 
degrees of freedom. For the regression of y, on 2, we obtain a ¢ of 18.4 
with 13 degrees of freedom. Both are, of course, highly significant and 
we feel justified in having estimated the path coefficients. 

The results of the present analysis are consistent with the con- 
clusion of Haldane and Priestley ‘that the smallest increase in the 
CO, percentage of the air breathed is accompanied by a compensatory 
increase in the alveolar ventilation, the latter increase being just about 
sufficient to keep the alveolar CO, percentage constant,” except that a 
quantitative interpretation is placed upon the phrase “just about 
sufficient.”” Use might possibly be made of the estimated path co- 
efficients in comparing individuals, in following the course of a pulmo- 
nary disease in a single individual, or in differential diagnosis of disease. 


13. Confidence Limits 


The problem of confidence limits for the path coefficients is not one 
easily resolved in general, even for the case of linear structural equations. 
When the path diagram is void of chain and cyclical regressions the 
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exact confidence limits of multiple linear regression apply and the reader 
should refer to one of the standard sources alluded to in Section 4. In 
many cases, as in the example of the previous section, estimators of the 
path coefficients are ratios of linear functions of ordinary regression 
coefficients. In these cases satisfactory approximate results are easily 
obtained. We illustrate the technique with the example taken from 
Haldane and Priestley. We previously found that 


Boa 


= 


Tx 


Doo = 


Then the expected value of 7, is zero and the estimated variance of 
T 2, is given by 


2 2 2.2 


where s;, and s;, are the estimated variances of b., and b,, , respectively. 
Since the two regression coefficients are normally distributed, T., is 
normally distributed and we may define a test criterion v by the equation 


v 2 2 2 2 
S2aQ21814 


If we knew the value of v corresponding to a given confidence level we 
could invert and obtain 


Q21 2 23 
bi. — 


(19) 


Taking the sign positive gives one limit and taking the sign negative 
gives the other limit, with confidence at the chosen level. 

If the statistics s3, and s{, were estimates of a common variance 
then s7 could be written s’ (1 + a3,), where s’ is found by pooling 
s;, and sj, , and would be distributed as x” with 13 degrees of freedom. 
In this case v would be distributed as Student’s ¢ with 13 degrees of 
freedom. In the present example a common variance is not assumed 
and s7 is a linear function of separately estimated mean squares. 
Satterthwaite [1946], following Fairfield Smith [1936], discusses an 
approximation whereby s? is taken to be distributed as x’ with degrees 
of freedom calculated by equating lower moments. If 
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then Satterthwaite proposes that s? be approximated by x’o? where 
x’ has n degrees of freedom given by 


_ (ast + 
+ Ne 


n 


NN. , --- are the degrees of freedom available for estimating sj , s} --- , 
respectively. In the present example A, = 1 and A, = a3, , an unknown. 
To a first approximation we can substitute a}, for a3, . Now if s} is 
distributed approxmately as x’o°, then y is distributed approximately 
as Student’s ¢ with n degrees of freedom. We have from the example 


b,, = 0.2702 = 0.002019 
boa = 264.0 8. = 204.7 


and n is calculated to be equal to 15.73. Interpolating in a table of 
Student’s ¢ distribution we find the tabular values of » for various 
levels of confidence given in column 2 of Table 2. Approximations 
based upon the Behrens-lFisher test and Welch’s test by substituting 
a;, for a3, are available in the present case since only two mean squares 
are involved. Values of v for these two tests are also given in Table 2. 
See Fisher and Yates [1948] and Pearson and Hartley [1954] for descrip- 
tion of these two tests and tables of critical values. It should be 
mentioned in passing that these tests are not logically equivalent and 
test somewhat different hypotheses. 

Taking v = 1.8 we get by substitution into (19) the approximate 
90% confidence limits 


whereas the point estimate was 980 after rounding to two significant 
figures. 
In order to find confidence limits for a,, we use 


Corresponding values of v are given on the right side of Table 2. Taking 
v = 1.7 at the 90% level we obtain 


—0.0032 < a. < —0.0024. 


The point estimate was previously found to be — 0.0028. 


14. Non-linear processes 


The extension of path analysis to non-linear situations is straight- 
forward although, with the exception of polynomials, simple estimators 
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TABLE 2 
VALUES OF THE TEST CRITERION vy FOR DIFFERENT LEVELS OF CONFIDENCE 


For a2 For ai 
Level of Smith- Welch | Behrens- Smith- Welch | Behrens- 
Confidence | Satterthwaite Fisher | Satterthwaite Fisher 
90% 1.75 1.75 1.7% 1.70 
95% 2.12 2.18 2.06 2:17 
98% 2.59 2.60 2.48 2.47 
99% 2.93 Seed 3.01 2.78 2.95 


are not available. We close this paper with an example of simple chain 
regression wherein each process is exponential rather than linear. We 
have the diagram 


The functional equations are 
Arata 
NH + (20) 


The procedure of Section 5 is applicable and the model equations are 
+ + «& 
Yo = Yo + Yai EXP + +e. 


To estimate the unknown parameters: (1) fit the first model equation 
(20) by some iterative method (see Stevens [1951]); (2) replace y: , 16 
and ),, in the second of (20) by estimated values from step (1); (3) 
fit the second model equation by the same method as was used for the 
first equation. 

We note that the non-linear analogues of the linear path coefficients 
are the partial derivatives 


(21) 


0 
Ne = = — = = — Y2)- 


These have been indicated in the path diagram. 
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A CLASS OF TWO REPLICATE INCOMPLETE BLOCK 
DESIGNS' 


J. Roy? 
Department of Statistics 
University of North Carolina 
Chapel Hill, North Carolina, U.S.A. 


Summary 


Two replicate incomplete block designs for comparative trials are 
useful when experimental units are costly and/or when experimental 
error is small. Not many are known [1], [3], [7], [8], [9]. In this paper 
a new class of two replicate designs called Simple Partially Linked 
Block designs is introduced. It is shown that with any of these designs, 
the variance of the estimate of the difference in effects of two treatments 
can be at most of seven different types. The general procedure of 
intra- and inter-block analysis is developed and illustrated with a 
numerical example. A list of these designs involving ten or fewer 
plots per block is given together with the values of parameters required 
in the analysis and the values of the efficiency-factor. It turns out 
that most of these designs are highly efficient with an efficiency-factor 
of the order of 75 per cent. It is indicated how other two replicate 
designs can be derived from these designs. 


1. Introduction 


New designs have sometimes been [9], [12], [14] constructed by 
dualization, that is by interchanging the role of the blocks and treat- 
ments of a given design. Consider a design D* involving v* treatments 
in b* blocks, each of k* plots, such that each treatment occurs at most 
once on each plot and altogether on r* plots. It is easy to see that the 
dual design D will involve v = b* treatments in b = v* blocks, each of 
k = r* plots and each treatment will occur in r = k* plots. It has also 
been shown [13] that the efficiency-factor E of the design D is given by 


(b* — 1)E* 
— + — 1) 


1This research was supported by the United States Air Force through the Air Force Office of 
Scientific Research of the Air Research and Development Command. Reproduction in whole or in 
part is permitted for any purpose of the United States Government. 
2Present address: Indian Statistical Institute, Calcutta-35, India. 
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where E* is the efficiency-factor of the design D*. Consequently 


> > 
E < E* according as_ b* 2 


Therefore, if we start with a design D* with a reasonably high efficiency- 
factor in which b* > v*, by dualizing it, we always get a design whose 
efficiency-factor is still higher. 

Partially balanced association schemes with two associate classes 
introduced in [2] are classified in [4] and listed in [5]. Though not 
exhaustive theoretically, [4] and [5] cover all the known cases. The 
five types of association schemes discussed in [4] are: (1) Group Divisible 
(GD), (2) Triangular (T), (3) Latin Square (LS), (4) Cyclic (C), and 
(5) Simple (Sl). Given any partially balanced association scheme 
with two associate classes and parameters n, , , (7, j,k = 1, 2), a 
partially balanced incomplete block design D* such that k* = 2 plots 
per block can always be constructed by forming one block with each 
pair of treatments that are first associates. In this design D* obvious!y 
there are b* = 4 n,(n; + m2 + 1) blocks, each treatment occurs on 
r* = n, plots and any pair of treatments occurs together on one or no 
block according as they are first or second associates. Some of these 
designs are discussed in [6]. The dual D of such a partially balanced 
incomplete block design with two plots per block will be called a Simple 
Partially Lined Block design. 


2. Simple partially linked block designs 


An allocation of v treatments in b blocks, each of k plots therefore 
forms a Simple Partially Linked Block (SPLB) design if the following 
conditions are satisfied: 


(i) Each treatment occurs at most on one plot in a block and 
altogether on two plots. 

(ii) Any two blocks have at most one treatment in common. 

(iii) Two blocks are first (second) associates if they have one (no) 
treatment in common and this association scheme is partially 
balanced with parameters n, , mn. , and p}, (i, j, k = 1, 2). 


Thus, 
v = +m + 0), 
b=n+n+1, 


% 
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and, of course, the number of replications for each treatment is 
r= 2. 


An SPLB design can be easily analysed by the P-method described 
in [11]. Let B,; denote the total for all plots in the 7th block and T; 
that for all plots getting the jth treatment. We shall write {7}; for 
the total for all treatments which occur in the ith block. The adjusted 
total P; for the 7th block is then given by 


2P; = 2B; — {T};. (2.1) 


For intra-block estimation, the block-effects are first estimated by means 
of the formula: 


b; = c(2P,) + ¢,8,(2P,) 
where S, denotes summation over first associates and 
c=a/A 

=1/A 


(2.2) 


(2.3) 


where 


a=n+ pn — Pir 
A= (pir ~ 0) 


The intra-block estimate of the effect of the jth treatment is then given 
(except for an arbitrary additive constant) by 


t; = — {b},] (2.5) 


where {b}; denotes the sum of the b;,’s for the two blocks in which the 
jth treatment occurs. 

Take two treatments: say, the jth and the uth. Two cases may 
arise: (X) the two treatments occur together in a block or (Y) they 
do not. 

In case (X) there are three blocks in which at least one of the two 
treatments occurs. In one of these blocks both the treatments occur. 
Consider the other two blocks. We shall say that the jth and uth 
treatments form a pair of the type X, if these two blocks are first 
associates and of type X, if these two blocks are second associates. 

In case (Y) there are four blocks, in two of which the jth treatment 
occurs and in the other two the uth treatment occurs. With these four 
blocks, it is possible to form four different pairs of blocks such that in 
each pair there is one block containing the jth treatment and one block 
containing the uth treatment. If v is the number of first associate pairs : 


(2.4) 
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amongst these four pairs of blocks, we shall say that the jth and the uth 
treatments form a pair of the type Y, (v = 0, 1, 2, 3, 4). 
We have thus classified all possible pairs of treatments into seven 
distinct types: X, , X.,and Y,, Y,,Y¥2,Y3,Y.. 
Consider now the variance of the intra-block estimate t; — t, of 
6; — 6, , the difference between the effects of the jth and the uth treat- 
ments. After a little computation, it is seen that 


Var (t; t,) = 


where o’ is the intra-block error variance. The value of v;, depends on 
the type of pair formed by the jth and the uth treatments and is tabu- 
lated below: 


Type of pair of 
treatments 


Value of v;x 


We thus see that in all seven different precisions are possible. 
To compute the efficiency-factor E of the SPLB design, we observe 
that the efficiency-factor E* of the dual design is given by 


E* + Ne 
2n,[c(m, + m2) — 


and therefore from (1.1) 


(v — 
~ — DA + 2Qn,[a(b — 1) — (2.6) 


Let G denote the sum and G, the sum of squares of all the observa- 
tions. The various components in the analysis of variance are then 
computed in the following order: first the total sum of squares (ss): 
T = G, — G’/2u, then the unadjusted block ss: St = 1/k > B? — G’ /2v 
and the unadjusted treatment ss: St = } }> T? — G?/2v, and next the 
adjusted block ss: Ss = >> b,P; . The adjusted treatment ss is then 
given by S; = + S} — Sf and the error ss by S; = T — St — S; = 
ig — Sz St . 


E 


| 
Xi 1 + 
x: 
Yo 1 + 2c + 2 
1 + 2 
Ys 
Y, 1+ 26 — 2e1 
Re 
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To make a combined intra- and inter-block analysis the first step 
is to estimate the ratio 6 of the intra-block error variance to the inter- 
block error variance. This is provided by 


in the sense that the ratio of the expectations of the numerator and the 
denominator of d is equal to 6. Then one computes for the ith block 


é(2P;) + é,8,(2P,) (2.8) 


where 


é=a/A 


(2.9) 
= 1/4 


a + 2d 
A= A+ 2d(a+n,) + 4d’. 


(2.10) 


The combined estimate of the effect of the jth treatment is then 
given (except for an arbitrary additive constant) by 


= 3(7; {6}; ] (2.11) 


where {6}; denotes the sum of the 6,’s for the two blocks in which the 
jth treatment occurs. 


3. A list of simple partially linked block designs with ten or fewer plots 
per block 


A list of SPLB designs with k < 10 derivable from known partially 
balanced association schemes is presented here. The list is arranged in 
increasing order of v and under the same 2, in increasing order of k. 
The values of the parameters v, b, k = n, , a, A, E and the type of the 
association scheme are shown. Of the designs listed, the lattice designs 
are, of course, well known and a few others with a GD type of association 
scheme are given in [9]. The other designs are new. 


4. Numerical illustration 


To illustrate the numerical procedure, let us consider the following 
artificial data (Table 4.1) giving the plan and the yields of a randomized 
experiment with an SPLB design involving 15 treatments in 10 blocks 
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TABLE 3.1 
List oF SPLB Desiens witu k < 10 


Association 
scheme 


PAW 


8 
7 
9 
6 
8 


264 
No. v k=m 1b a A E 
: 1 4 2 4 4 8 0.600 GD 
2 9 3 6 6 18 0.667 GD 
3 | 12 4 6 6 24 0.750 T 
4 | 15 3 610 4 10 0.565 T 
5 | 16 4 8 8 32 0.714 GD 
: 6 | 18 9 5 18 0.680 LS 
7 | 24 8 8 48 0.807 GD 
8 | 25 10 10 50 0.750 GD 
9 | 27 9 9 54 0.796 GD 
10 | 30 10 7 40 0.782 T 
_ 11 | 36 12 12 72 0.778 GD 
i 12 | 39 13 7 39 0.760 Cc 
13 | 40 10 10 80 0.841 GD 
14 | 45 15 8 45 0.755 T 
15 | 48 16 6 32 0.740 LS 
16 48 12 12 9%6 0.829 GD 
17 | 49 14 14. 98 0.800 GD 
18 | 54 12 12. 108 0.848 GD 
19 | 57 19 7 38 0.738 Sl 
20 | 60 15 8 60 0.812 T 
21 | 60 10 12 + 120 0.863 GD 
22 | 64: 8 16 16 += 128 0.818 GD 
23 | 68 8 17 9 68 0.807 Cc 
2 | 72 9 6 11 96 0.833 LS 
2% | 75 10 15 £15 + 150 0.854 GD 
2 | 81 9 16 #18 162 0.833 GD 
27 | 100 8 25 7 50 | 0.784 LS 
28 1100 10 20 20 200 0.846 GD 
29 |105 10. 21 9 84 0.836 T 
30 |105 10 21 13 126 0.841 T 
| 31 |130 10 2% 11 104 0.832 Sl 
32 1135 10 27 14 135 0.835 Sl 
33 |180 10 36 8 72 0.813 LS 
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each of 3 plots. The figures in brackets indicate the serial numbers 
for the treatments and the figures below them are the corresponding 
yields. 


TABLE 4.1 
PLAN AND YIELDS 


Blocks 


(18) (6) (3) (14) (11) (4) 


4.5 5.8 4.2 4.7 5.1 3.8 


(10) (13) (7) (10) (11) (12) 


9.9 5.3 6.7 6.3 5.7 5.8 


(3) (1) (15) (8) (5) 


4.4 2.3 4.9 8.0 7.5 


(12) (15) (1) 


7.3 4.2 2.4 


(2) (9) (14) 


8.6 3.0 5.4 


The design was obtained by dualising the partially balanced design D*. 


jlocks 


Treatments] 1,8 1,9 1,10 2,6 2,7 2,10 3,5 3,7 3,9 4,5 4,6 4,8 5,10 6,9 7,8 


with v* = 10, b* = 15, k* = 2, r* = 3, and having a triangular type of 
association scheme with the parameters: 


Ne 


| 2] 


and 


. 
Blocks 
: 
2 7 
QQ 
3 8 
(7) (8) (9) 
9 
3.9 7.1 0.8 
(6) (4) (5) 
5 10 ae 
2.2 3.5 5.3 
| 
m=6 
? 
2 4 
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Z 

AS 


bi 


2.579 
7.654 
1.640 
7.802 


—0.4571 
—0.7026 4.351 


First asso- | S,(2P;) 


ciates of 7 


{b}; 


4.18 |+0.0087 4.296 
4.98 |—1.5706 4.435 


6.02 |+0.6766 6.262 
7.14 |—0.7628 6.781 
4.57 

4.54 |+0.7107 4.945 
6.33 |—1.1320 5.966 
7.14 |—0.7298 6.915 
3.47 |+1.8325 3.984 
4.60 |+0.7733 4.663 
4.40 |+0.4474 4.326 


7.63 |—0.2086 
1.34 |+0.5195 
7.60 |+0.5951 


2.66 


—0.62 
1.16 
0.24 

—2.67 

—1.49 

—1.14 
1.53 

—0.16 
1.11 
1.01 

—1.86 

—1.19 
2.86 
0.91 
0.31 


76.60**| +0 .0003* 76.599** 


2P; 


{T}; 


Ar 


TOM 


Blocks in which 
treatment j occurs 


ona 
© 


WHOM 


WHOM OAS 


Treatments, 


AN OD 


**Check: Sum is G/2, 


*Check: Sum is zero. 
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i — 
: 2, 3, 5 —2.8 0.76 0.4867 
1,4,7 —7.8 2.10 1.3458 
1,9,10° | 10.4 —0.52 —0.4780 
2, 8, 10 17.9 -—0.57 —0.6351 
i 1, 6,8 5.4 —1.90 —1.1893 
5,7,10 | -0.77 0.3813 
2, 6,9 7.1 -—1.09 —0.7507 
4,5,9 —11.1 0.41 0.4265 
3, 7, 8 —4.6 —0.10 0.0209 
1 3, 4, 6 —10.8 1.68 1.1546 
Total | 153.2 306.4 0° OF 
3 
1 
5 
5, 
] 
2 
4 
10 
11 
12 
13 
14 
15 
| 
Total | 153.2 
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For this design, we have 
=3+1-0 
A = bp, = 10X1 = 10 

a/A = 0.4 

= 1/A =0.1 


ll 
II 
He 


c 


(v— 1A 
— + 2n,fa(b — 1) — 


. (15 — 1) x 10 
~ (15 — 10) X 10 + 2 X 3{4 X (10 — 1) — 3} 


50+6X 33 248 
To carry out the analysis of variance, we compute: 

G = 153.2 n=30 #£G’/n = 782.341 

G, = 901.20 T = G, — G’/n = 118.859 
> = 2477.96 St = 3 Bi — G’/n = 43.646 

= 1737.78 St = 4 — G’/n = 86.549 

Ss = >> bP; = 3(52.812) = 26.406 S, = S, + St — S% = 69.309 
Sz = T — S$ — Sr = 5.904. 


TABLE 4.3 
ANALYSIS OF VARIANCE 


Variation due to | Sum of | Degrees of | Sum of Variation due 


squares | freedom | squares to 


Blocks 


(Unadjusted) 43.646 9 26.406 | Blocks (Adjusted) 
Treatments 69.309 14 86.549 | Treatments 

(Adjusted) (Unadjusted) 
Error 5.904 5.904 | Error 


Total 


118.859 118.859 | Total 


To test if treatment differences are significant, we compute the variance- 
ratio 
_ 69.309/14 


F= 5.904/6 = 5.031 
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which with 14 and 6 degrees of freedom is significant at the 5 per cent 
level. 

To test any particular treatment difference, say that between treat- 
ments 1 and 2, we proceed as follows: The best intra-block estimate of 
the difference is 


t, = —3.36. 
Now, treatments 1 and 2 occur together in block 3 and the other blocks 
in which they occur are: block 9 (in which treatment 1 occurs) and block 
10 (in which treatment 2 occurs). But the pair of blocks 9 and 10 are 
second associates because they do not have a treatment in common. 


Hence, the treatments 1 and 2 form a pair of type X,. The variance 
of (t, — #,) is thus 


(1 + = 1.40° 
and this is estimated by 
1.4 X 5.904/6 = 1.3776 
and the standard error is 
V1.3776 = 1.17371. 
We then have the Student ratio 
—3.36 


For combined estimation, we have: 
G@ = a+ 2d = 5.68206 
A = A+ 2d(a +n) + 4d’ = 24.60375 
é = a/A = 0.23094 


é, = 1/A = 0.040644. 


The best combined estimate of the difference between treatments 1 and 
2 is then found to be 


i, — i, = —3.683. 
5. Construction of other two replicate designs 


In like manner, two replicate designs can be constructed from any 
partially balanced association scheme with m > 2 classes, by first 
constructing a partially balanced incomplete block design with k = 2 
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and Ay = Az = ++: =A, = 1, = = Aw = O and then dualizing 
it. By replacing each object by a group of t(t > 2) objects in a partially 
balanced association scheme with m associate classes, one gets again a 
partially balanced associate scheme with (m + 1) associate classes. 
This result can be used in constructing other two replicate designs. 
Another way would be to replace each treatment in an SPLB design by 
a group of ¢(¢ > 2) treatments. This, however, will not be pursued 
in this paper. 
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THE CENTRIC SYSTEMATIC AREA-SAMPLE TREATED AS A 
RANDOM SAMPLE 


A. Mitne* 
School of Agriculture, King’s College 
Newcastle upon Tyne, England 


1. INTRODUCTION 


The fundamental answer required from the sampling of any popu- 
lation is a sufficiently narrow definition of the limits within which the 
true mean lies. The populations concerned here are those of organisms 
(or their individual characteristics) and of their environmental factors 
as distributed over land, on the surface or below. 

Statisticians have pointed out not only the theoretical objection 
but also the possibility of danger attached to systematic sampling 
treated as if random (e.g. Finney [1947, 1948]; Yates [1953]). Yet 
contemporary literature shows that field workers commonly take the 
systematic area-sample (which involves much less trouble in locating 
sampling points) and cheerfully analyse it as if it were random. The 
present paper attempts to assess the amount of danger actually involved 
with respect to the “centric” systematic area-sample. The assessment 
is to be made on the basis of (i) practical tests on complete enumerations 
of biological variates, i.e., populations for which the true mean is 
known (§§2-4) and (ii) knowledge of spatial distributions in the field 
($5). 

Figure 1 shows what is meant by a centric systematic area-sample. 
The units of the sample lie on equi-distant parallel lines (not shown) 
but these are so arranged that in effect (see dotted lines) the area is 
divided into equal squares and a sampling unit taken from the centre of 
each square. The distance of any outer unit from the edge is then 
half that between neighbouring units. It should be noted that there 
is only one centric sample of a given size for any area. There are, of 
course, numerous eccentric systematic samples of the same size, the 
number being limited only by the sampling interval; in Fig. 1, for 
example, if the sampling interval is every tenth unit there are 10 K 10 = 
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100 systematic samples possible (by moving the grid over the surface), 
including the one centric individual. The latter individual is preferred 
because it does not favour any side of the area more than another. A 
second reason for preferring the centric individual will emerge in the 
Discussion (§5.1). 


FIGURE 1 
EXAMPLE OF A CENTRIC SYSTEMATIC AREA-SAMPLE 


It is necessary to be rather tediously exact about what is meant by 
treating the centric systematic sample as if it were a random sample. 
If the variate x has normal frequency distribution then, in complete 
array from all possible unit combinations (“C,), the sample mean Z 
is normally distributed about the population mean X with standard 
deviation ¢/ Vn (where o is the standard deviation of the total popu- 
lation, N, of sampling units; and n the number of units in the sample). 
Hence, with a random sample, the fiducial limits for X (i.e. the probability 
statement of the fundamental answer) are given by (@ + ts;). In this 
expression, ¢ is the value (in the Table of ‘‘t’’) appropriate to n and the 
required probability level; while and s, (the latter being the estimate 
of ¢/-Vn) are calculated by simple arithmetical procedures from the 
sample data. Now there is nothing to hinder one applying to the 
systematic sample the selfsame arithmetical procedures which result 
in ¢ and s, for the random sample. To avoid confusion, let the results 
for the systematic sample be called m and e respectively instead of € and 
s,. Then if (m + te) is regarded as the fiducial limits for X, the system- 
atic sample has been treated as if it were a random sample, 
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In the practical tests, the hypothesis to be examined empirically is 
that (m + te) gives as good an answer as ( + ¢s,). Answers will, of 
course, be equally good if X is included equally often within limits 
equally narrow. How can goodness of answer be compared? 

Given complete enumeration of populations, the reliability of 

(€ + ts;) as a probability statement may be demonstrated either (a) 
by repeated random sampling from one particular population or (b) 
by taking a single random sample from each of a number of different 
populations. Thus for the 0.95 fiducial interval, the expectation in 
both cases is that ( + ¢ts,) will include X in 95 per cent of samples. 
Obviously, since there is but one centric systematic sample of a given 
size for any population, comparison of this kind of systematic sampling 

with random sampling can only be done according to method (b). 
From the practical viewpoint, method (b) is quite pertinent because 
not only is a single random sample sufficient to furnish the probable 
answer concerning a population but also a single sample is all that is 
taken in usual practice. There are two kinds of random sample, how- 
ever, the unrestricted and the stratified. Fortunately, the centric 
systematic sample, because of its form, lends itself equally well to 
treatment either as unrestricted or as stratified random. Thus the 
hypothesis above can be tested on a practical basis by taking three 
samples of the same size—one unrestricted random, one stratified 
random, and the one centric systematic—from each population in a 
representative series of populations. 

Before going further, it might be as well to point out three things 
the writer is not doing in this paper. 

Firstly, the writer is not trying to find the standard error of the 
mean of the centric systematic sample as such. The standard error is 
an estimate of the standard deviation of the population of similarly- 
derived sample means. With a total population of one, the centric 
systematic sample mean can have no standard error in the accepted 
sense. 

Secondly, the writer is not treating the centric systematic sample 
as though it were random “‘on the basis of a pseudo-theoretical argument 
that the same units might have been obtained in a truly random sample”’ 
(as has been said of others). It is true that all possible combinations 
of units have an equal chance of arising as the random sample. But 
the chance of the unique centric systematic sample arising is so infini- 
tesimally minute as to be not worth consideration. Even with N as 
small as 100 the chance of the centric systematic sample corresponding 
ton = 10 turning up is a mere 1 in 17, 310, 309, 456, 440. 

Thirdly, the writer is not attacking random sampling theory. He 
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is merely making an empirical approach to the question: is there much 
danger or does one go far wrong in analysing the centric systematic 
sample as though it were random? 


2. THE COMPLETE ENUMERATIONS 


The making of a complete enumeration is usually both onerous and 
costly. One does not expect very many to have been made. But it is 
extremely disappointing to find, as so often in following up a reference, 
that the maker has published not the complete enumeration but his 
conclusions (right or wrong!) on some narrow statistical aspect of it. 
This is particularly true of, but not confined to, forestry work; a round 
dozen references failed to produce a single complete enumeration! The 
writer took every enumeration of more than 200 units that he could find. 

Mercer and Hall [1911] seem to have made the first substantial 
complete enumerations in so-called ‘uniformity trials” of wheat and 
mangolds (which like all subsequent uniformity trials showed not 
uniformity but “the practical universality of field heterogeneity” as 
Harris [1920a] puts it!). Over the next two decades, others made 
complete enumerations of a variety of agricultural, horticultural, and 
orchard crops, mainly in order to study the effect of different shapes 
and sizes of plots in field trials. Much searching produced a bare score 
of such enumerations. In one or two of these cases, a series of complete 
enumerations had been made at different times on the same subdivision 
of the same area; use could be made of only one enumeration in each 
series because the units of a field tend more or less to produce the same 
amounts relative to one another from crop to crop (cf. Harris [1920b]}) 
and the centric sample necessarily always contains the same units. 
Again, for the latter reason, only one variate could be used when several 
correlated variates had been measured simultaneously in an enumer- 
ation area. The above 20 complete enumerations were each used in 
their entirety except in one case (Kalamkar [1932a]) where, for con- 
venience in sampling, the last 6 of the 96 rows (6 per cent of the data) 
were discarded. 

Apparently insects are the only animals for which sizable complete 
enumerations by area have been made, and only three sources were 
found in the course of a most diligent search of scientific journals. 
Fleming and Baker [1936] counted Japanese Beetle larvae by square 
feet in the soil of four 50 X 50 ft. plots in two fields. The enumeration 
data have not been published in detail but Dr. W. E. Fleming kindly 
supplied photographic copies of his original mapped records. The 
Fleming and Baker data provided 20 enumerations for test, 4 of 2500 
units each and, by quartering the plots, 16 of 600 each (one column of 
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each quarter being rejected for convenience in sampling). Marshall 
[1936] counted American bollworm eggs on maize and from his Fig. 1 
together with his Appendix II the number per sq. yd. can be mapped 
in 552 sq. yd. units, i.e. materis! ‘or one test. Beall [1938] gives fully 
mapped details of counts of Coiorado Beetle on potatoes for 2304 units 
of area; by quartering, these data provided a further five enumerations 
for test. Dr. John MacLeod kindly furnished unpublished records of 
blowfly numbers from 84 traps set regularly over a field. The writer’s 
own contribution was a count of Garden Chafer Beetles per sq. yd. in 
275 sq. yds., which is to be published in full elsewhere. 

The total of enumerations (populations) for test was now 48. 
A 49th was compiled by counting hills as an environmental factor in 
23 xX 23 mile units of a 50 X 50 mile area in the Border 
Country (England-Scotland) as shown in the “Geographia Map” 
Sheet No. 1. The Table of ‘‘é’’ (Fisher and Yates [1943] Table III. 
Distribution of “?’’), with the last four columns and last four rows 
discarded, was employed as the 50th complete enumeration; the ‘“‘?’”’ 
values were to be regarded as some imaginary biological or environ- 
mental variate with a distribution such that unit value increases con- 
sistently along the direction of the sampling lines (see $5.1). 

Authors invariably subdivided their study areas into equal units 
(for enumeration) by means of square or rectangular grids. Mapping 
then merely entailed tabulation by column and row. 

The column and row totals for Populations 1-49 showed, when 
graphed, no features whereby the spatial distribution of one kind of 
plant or one kind of insect could have been distinguished from that of 
another; nor indeed any distinction between plants, insects, and hills. 

Frequency distributions of the variates run from normal to extremely 
skewed. The distribution is normal or very slightly skewed in Popu- 
lations 1, 2, 4, 8, 9, 11, 13, 17, and 20; appreciably skewed in 3, 5 to 
7, 10, 14 to 16, 18, 19, 21, and 25 to 28; very skewed in 12, 22, 23, 30, 
32, 33, 35, and 36; and extremely skewed in 24, 29, 31, 34, and 37 to 50 
(see Table 1 for numbering). The data were used without normalising 
transformations, because ¢ tends to be normally distributed even when 
the distribution of x is quite skewed, and, in field practice, normalising 
transformation of the variate is usually unnecessary. 


3. THE METHODS OF SAMPLING 


As noted in the Introduction, three samples of the same size were 
to be taken from each population: an unrestricted random sample, 
designated URS; a stratified random sample, SRS; and the centric 
systematic sample, CSS, which would be treated first as if it were 
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unrestricted random, i.e. CSS/URS, and then as if stratified random, 
i.e. CSS/SRS. Stratification was to be at the rate of two sampling 
units per stratum if possible, and not more than three units otherwise. 
Actually only five of the fifty populations required three units per 
stratum: Populations 2, 4, 15, 17, and 18. Fig. 2 shows how the CSS 
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FIGURE 2 
Division oF Fig. 1 into TEN Equa. STRATA WITH Two SAMPLING 
UNITS PER STRATUM 


exemplified in Fig. 1 would be stratified for CSS/SRS. The correspond- 
ing SRS would have the same strata, thus ensuring a fair comparison, 
but the two units in each stratum would, of course, be selected at 
random. The strata of a population (complete enumeration) were all 
equal in size except in eight of the fifty cases. In these eight cases, 
one stratum (two in one case) was slightly larger or smaller than the 
rest (see Appendix I for details). These inequalities were necessary 
either in order to avoid the discarding of data in already small popu- 
lations or to take advantage of the author’s provision of the standard 
deviation in large populations. The inequalities were trifling and, 
since both SRS and CSS/SRS suffered identically, no adjustment for 
inequality of strata was made in the calculation of the statistics. 

With N varying from 84 to 2500, size of sample was a many-sided 
problem. At least two things seemed certain: the practical man takes 
the smallest sample that will serve his purpose (e.g. 20 tiny units per 
field in a wireworm survey, see Finney [1946]), and some allowance 
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should be made for population size. It was finally decided that a 
sample of 20 units would be taken from populations of 500 units and 
over; 10 to 20 units from populations of 200 to 500, with n roughly in 
proportion to N within this range (actually there had to be some com- 
promise on this point, see Appendix II); and 10 units from populations 
of less than 200 (three cases: 184, 160, and 84). This defines “the 
prescribed size, p’’ of samples—a phrase used for economy in Appendix 
II. In one case, Population 15, the sample size had to be 21 units. 

In actual field practice, one can adjust sample size to the area and 
shape of the field or plot so as to have the truly centric sample. But in 
sampling a published complete enumeration, with sample size fixed, 
there may, according to the numbers of rows and columns, be 0, 1, or 
4 co-equal alternatives for choice as the centric sample. Owing to the 
well-known correlation between neighbouring units (see §5.2), the 
alternatives should usually vary little in character. Nevertheless, in 
order to avoid temptation to take the better or best of the alternatives, 
rules were laid down which enabled the co-ordinates of the systematic 
sample to be determined objectively. To apply these rules, the only 
requirement was a knowledge of the number of columns and the number 
of rows in the enumeration table. Details of the rules are given in 
Appendix II. It should be noted that CSS units are not only centric 
to the whole area but also centric to their respective strata. 

Fisher and Yates’ [1943] Tables of Random Numbers were employed 
for selecting the random samples. 

The parameters of the populations (complete enumerations) are 
given in Table 1; the statistics calculated from the samples in Table 2. 
The individual complete enumerations are numbered ‘Populations 1, 
2, 3, --- , 50” in both Tables, with author(s) and year of publication 
added in Table 1. Publication details are given in the references at the 
end of the paper. 

The abbreviations URS, SRS, CSS, CSS/URS, and CSS/SRS, as 
defined earlier in this section, are used wherever possible to save space. 
It should be noted that the final ‘‘S” stands for either “sample” or 
“sampling” according to the context in which the abbreviation occurs. 


4. RESULTS OF SAMPLING 


It is known that SRS tends to give a mean nearer the true mean and 
a smaller standard error than URS. The truth of this is quite apparent 
in the present data and no further mention need be made of the matter 
as such. Attention will be confined to the comparison of results from 
centric systematic sampling and random sampling. 
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Cols. = columns. 


TABLE 2(a) 


DETAILS PERTAINING TO THE SAMPLING OF THE FIFTY POPULATIONS 


n = number in sample (size of sample). 
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The last column gives the numbers of strata employed for restriction in sampling. 


Published 
plan* 
(tabulation) Co-ordinates of the 
of complete systematic sample 
enumeration 
P.R. 
No. | Cols. | Rows} n Cols. Rows Strata 
1 25 20 | 20 | 3,8, 13,18 | 3,8, 13, 18, 23 10 
2 10 | 16 | 12/2,6,9 3, 7, 11, 15 4 
3 12 | 125 | 18 | 4,10 7, 21, 35, 49, 63, 77, 91, 105, 119 9 
4 14 | 16 | 12/3,8,12 3, 7, 11, 15 4 
5 16 80 | 20 | 5,13 5, 13, 21, 29, 37, 45, 53, 61,69, 77 | 10 
6 12 | 24 | 10} 4,10 3, 8, 13, 17, 22 5 
7 20 20 | 16 | 3,8, 13,18 | 3,8, 13,18 8 
8 10 | 20 | 10/3,8 3, 7, 11, 15, 19 5 
9 10 | 60 | 20/3,8 4, 10, 16, 22, 28, 34, 40, 46, 52,58 | 10 
10 6 90 | 20} 2,5 5, 14, 23, 32, 41, 50, 59, 68, 77,86 | 10 
ll s | 23 | 12/3,7 2, 6, 10, 14, 18, 22 6 
12 6 | 45 | 10/2,5 5, 14, 23, 32, 41 5 
13 4 | 50 | 10|2,4 6, 16, 26, 36, 46 5 
14 10 50 | 20/3,8 3, 8, 13, 18, 23, 28, 33, 38, 43, 48 10 
15 20 | 50 | 21|4,10,17 | 4,11, 18, 26, 33, 40, 47 7 
16 15 | 33 | 16/ 4,12 3, 7, 11, 15, 19, 23, 27, 31 8 
17 12 | 20 | 12|2,6,10 3, 8, 13, 18 4 
18 14 | 26 | 15 |3,7,12 3, 8, 14, 19, 24 5 
19 10 | 28 | 14/3,8 2, 6, 10, 14, 18, 22, 26 7 
20 8 | 28 | 14|3,7* 2, 6, 10, 14, 18, 22, 26 7 
21-24 | 50 | 50 | 20 |7,19,32,44| 6, 16, 26, 36, 46 10 
25-40 | 24 | 25 | 20 | 4,10, 16,22] 3,8, 13, 18, 23 10 
41 48 48 | 16 | 7,19, 31, 43] 7, 19, 31, 43 8 
42-45 | 24 | 24 | 16 | 4,10, 16,22] 4, 10, 16,22 8 
46 24 | 23 | 16 | 4,10, 16,22) 3,9, 15, 21 8 
47 6 14 | 10|2,5 2, 5, 8, 10, 13 5 
48 11 | 25 | 10/3,9 3, 8, 13, 18, 23 5 
49 20 20 | 16 | 3,8, 13,18 | 3,8, 13,18 8 
50 9 30 | 12/3,7 3, 8, 13, 18, 23, 28 6 


*See text for few cases in which a small section of the enumeration data was discarded. 
**Batchelor & Reed numbered these columns 6, 2; they numbered from right to left! 
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4.1 The sample means, & and m 

When a single random sample of a given size is taken from each one 
of a series of differing large populations, and when each sample mean is 
expressed as a percentage of the true mean of the population from 
which it came, then the series of transformed sample means so obtained 
should tend to be distributed normally around the value 100 per cent. 
Fig. 3 reveals this tendency quite obviously in the cases of URS and 
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FIGURE 3 > 
FREQUENCY DISTRIBUTION OF £ (OR m), WITH # (OR m) EXPRESSED AS % OF X. 
(50% ReEpreEsENts 45.0-54.9% AND SO ON) 


SRS 


FREQUENCY 


SRS; what is more interesting: the same tendency is apparent in the 
case of CSS. In this respect then the centric systematic sample probably 
does not differ from the random sample. 

Fig. 3 also suggests that on the average the CSS mean comes nearest 
to the true mean. One would expect this from the published findings of 
other workers on systematic sampling (e.g. Osborn [1942], and references 
in various textbooks). 

Expressing the deviation of < or m from X as %X in each population 
and ignoring sign (because the interest here is in magnitude, not direc- 
tion of deviation), the average and range of deviations were as follows: 
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| URS SRS CSS 
Average 11.24 9.42 6.79 
Range 0.00-77.68 0.67-49.86 0.06-31.89 


Analysis of variance (components being sampling methods, populations, 
and error) gave F, for methods-variance/error-variance, just significant 
at the 0.05 level, but the standard error of the difference of two averages 
(1.80) indicated a significant difference only between URS and CSS 
(P = < 0.02). 

Turning now to performance in the individual populations: the mean 
from CSS was nearer X than was the mean for SRS in 32 of 49 cases, 
there being one instance of equality of ¢ and m. If no difference existed 
between the two kinds of sample mean, the proportion expected would 
be 24.5 out of 49. x” (corrected) shows P to be just less than 0.05. 

On the whole, then, it may be concluded that centric systematic 
sampling gives at least as good a mean as, if not rather a better mean 
than, random sampling. 

It is interesting to note that only three of the 150 sample means 
were beyond the limits (X + 1.96 ¢/~/n). These three all occurred 
in URS, viz. 


Population Limits 
18 228.9 229 9-312 .2 
34 4.65 2.34-4.54 
50 1.482 0.489-1.180 


4.2 e and the standard error, s; 


The frequency distributions of s; and of e, both s; and e being 
expressed as percentages of ¢/ Wn, are illustrated in Fig. 4. These 
show remarkable similarities of form as well as the expected bias towards 
under-estimation of o/ Vn. 


The average and range of per cent of ¢/ Vn in each of the distri- 
butions were: 


URS SRS CSS/URS CSS/SRS 

Average 97.89 84.17 91.27 81.72 
35.2 30.4 58.3 38.8 
—264.2 —-134.6  —132.5 —151.3 
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FIGURE 4 
FREQUENCY DISTRIBUTION OF 8; (OR €), WITH 8; (OR e€) EXPRESSED as % oF o/Vn 


Analysis of variance (components as before) gave F highly significant 
(P = < 0.01), but the standard error of the difference of two averages 
(4.55) indicated that there was no significant difference between either 
URS and CSS/URS or between SRS and CSS/SRS. 

For the individual populations, the number of cases in which one 
sampling method gave a smaller value for s; or e than another may be 
tabulated thus: 


CSS/URS URS | CSS/SRS SRS 


30 20 | 29 21 


On the hypothesis that there is no difference between the members of a 
pair, the true proportions would be 25:25. One member of a pair would 
then have to score 33 or more cases before a significant departure from 
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that hypothesis would be indicated at the 0.05 level of probability. 
The x’ test therefore confirms the result of the analysis of variance 
above. 

The conclusion is: on the whole the data do not contradict the idea 
that e, derived from the centric systematic sample, gives as good an 
estimate of ¢/ Vn as does s, , derived from a single random sample. 


4.3 The limits of X as defined by ( + ts;) and by (m + te) 


In calculating limits for X from (4 + ts,) and (m + te), the value 
of t was read for (n — 1) degrees of freedom at the 0.05 probability level. 
Let the interval between the higher and lower limit be called the “limits 
interval.” This interval failed to include X in 11 of the 200 cases: 3 
from URS, 2 from SRS, 2 from CSS/URS, and 4 from CSS/SRS. For 
these 11 cases, the difference between X and the nearer end of the limits 
interval, expressed as %X, is given below, along with (in brackets) the 
value of s, or e expressed as per cent of ¢/ Vn: 


Popln. URS SRS CSS/URS CSS/SRS 
3 -- 0.19 (72.7) 
6 0.11 (35.2) _ — 0.86 (78.0) 
8 0.15 (71.1) 0.43 (30.4) 
31 10.99 (58.3) 11.84 (55.9) 
38 7.17 (71.5) — 5.26 (65.6) 1.21 (75.7) 
46 23 .30, (53.9) 


None of the 11 sample means concerned was beyond (X + 1.96 ¢/ Vn). 
The failure was due solely to under-estimation of ¢/~Vn in every case 
(see brackets above). It is also interesting to note that the three URS 
cases (Populations 18, 34, and 50, see before), wherein was beyond 
the limits (X + 1.96 «/-V/n), were saved from failure by over-estimating 
o/ Vn, the estimates being 111.1, 109.8, and 264.2 per cent respectively. 

With 3, 2, 2, and 4 failures (wrong answers) in 50 trials each, it 
cannot be said that any of the four methods was better or worse than 
another in respect of failures. Tabulation of the cases of near-failure 
confirms this, viz. 


The Nearer End of the 
Limits Interval Included of 
X by less than URS  CSS/URS’ CSS/SRS 
1% of X 1 1 2 0 
3% of X 5 5 4 5 
5% of X 11 9 9 9 
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4.4 Conclusion 


Since X was included equally often within limits equally narrow, 
(m + te) gave as good an answer as (# + ts;). That is, within the scope 
of the available experimental material, the CSS treated as if random 
was no more dangerous than a truly random sample. The real interest 
of this finding, however, lies in the question of its general applicability. 


5. DISCUSSION AND GENERAL CONCLUSION 


Bad samples (those giving a wrong answer) are far outnumbered 
by the good samples existing potentially in any biological population. 
For instance, given N units of normal frequency and “t’” read for P = 
0.05, only five per cent of all possible combinations of n units can give 
a wrong answer when subjected to the arithmetical procedures symbol- 
ised by (¢ + ts,). This being so, the immediate results of the experiment 
are just what would be expected provided (i) there were no systematic 
pattern of spatial distribution (capable of defeating CSS), and (ii) the 
average correlation between the values of adjacent units were positive. 
Hence it is on the basis of conditions (i) and (ii) that one can try to 
decide how far the experimental finding is applicable to outdoor popu- 
lations of organisms in general. These conditions are best known by 
field workers, particularly ecologists. Others should not be offended if 
they are asked to avoid extrapolation from the relative simplicity and 
regularity of laboratory or factory circumstances into the appalling 
complexity and irrezularity of the field. 


5.1 Systematic patterns of spatial distribution 


Both Finney [1947, 1948, 1950] and Yates [1953] have pointed 
out the danger to systematic sampling arising from unsuspected: (a) 
periodic variation, (b) consistent increase, in unit value along the direc- 
tion of the sampling lines; and (c) “marked strip effects running in 
straight lines across the material in such a manner that the whole of 
one line of sample points falls on the same strip” [Yates, 1953]. 

Fairly consistent increases (b) are not unknown in the field, but 
even completely consistent increase would obviously be no danger to 
CSS because of the latter’s centricity (cf. results from Table of “?’, i.e., 
Popln. No. 50). This is the second reason for preferring the centric to 
the eccentric systematic sample (the first was given in §1). 

Marked strip effects (¢) are relatively rare in nature; they are seen, 
for example, where different geological strata outcrop within an area to 
produce different soil and moisture conditions. Strip effects can also 
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be imposed by drastic human activities, e.g. draining, manuring, 
cultivating, etc. Marked strip distribution of organisms is clearly due 
to equally marked strip distribution of environmental conditions. The 
latter generally furnish external (visible) evidence of their presence in 
the quality or quantity or in the species composition of the vegetation. 
This is direct evidence of strip distribution to the plant sampler and 
indirect evidence of its possibility to the animal sampler. Such evidence 
would lead a careful worker to sample obviously different strips as 
separate populations. Moreover, there is no danger even in the absence 
of visible evidence. Strip distribution, if sufficiently marked to be a 
danger, must show itself in the sample data—provided these data are 
recorded on the systematic plan, as they should be. 

The remaining pattern, periodic variation (a), requires fuller con- 
sideration. 


5.11 Occurrence of periodic variation in the field 


Finney’s [1950] “example of periodic variation in forest sampling” 
seems to be the only instance of a natural periodicity ever claimed for 
a biological variate over land. The forest, Dehra Dun, is a natural 
growth in India. Finney describes the data as follows: “In 1947 a 20 
per cent enumeration --- was made, using strips of 2 chains width but 
enumerating only every fifth strip of the total number that could have 
been taken (across a part of the forest stated to be about 36 miles long). 
The trees in each 5-chain length of strip were recorded separately, so 
giving recording units of one acre.’ Altogether 292 strips were enumer- 
ated. Since the forest is very irregular in outline, these strips varied 
considerably in length (2 to 50 units or 10 to 250 chains). Finney 
calculated the mean volume of timber per unit (one acre) for each 
enumerated strip, regarding these means as “‘strip-values” --- ‘without 
any attempt to make allowance for difference in strip length.” Ulti- 
mately he demonstrates the periodicity by taking all of the 17 systematic 
samples available from an interval of 17 strips, i.e. the first sample 
consists of strip-values 1, 18, 35 --- , the second of strip-values 2, 19, 
36 --- , and so on up to the seventeenth which consists of strip-values 
17, 34, 51 --- . The averages of these seventeen samples are shown to - 
rise fairly consistently from 943 cu. ft. of timber per acre in the first 
sample to 1314 in the 7th and then to fall, again fairly consistently, to 
879 in the 17th (see Finney’s Table II). This is proof of periodicity 
and Finney concludes: ‘‘Undoubtedly then, the 292 strips of Dehra Dun 
show periodicity, with a sequence of rise and fall in volume of timber 
repeated every seventeen strips (i.e. 85 strips of the whole area) .... 
No explanation of this phenomenon can be offered. Of course, it could 
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arise from a regular variation in some soil or topographical feature, 
but inquiry from those who know the region has failed to disclose any 
tendency for hills and valleys to occur regularly at 170-chain intervals 
over the 36 miles of forest sampled! ...” 
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The Dehra Dun findings intrigued the present writer immensely 
since all his field experience has led him to conclude that natural spatial 
periodicity is non-existent. Dr. Finney supplied details of the unit 
values in the 292 strips. Checking confirmed his evidence of periodicity. 
In evaluating a forest, however, allowance would assuredly be made 
for strip length. But clear evidence of periodicity still remains when 
strip length is taken into account (Fig. 5A). 

Finney then kindly asked Dr. K. R. Nair, of the Indian Forest 
Research Institute, to send the writer a map of the sampling procedure. 
This showed only the outline of the forest area with the numbers and 
positions of the strips. But it revealed a fact of which Finney had been 
entirely unaware, namely, the forest area had a length greater than 
36 miles. A section equivalent to 24 sample strips (3 miles) had not 
been enumerated between the strips numbered 187 and 188 in his data 
list. In other words, the list of 292 strips did not comprise a single 
systematic sample of the forest area. Therefore, to investigate the 
possibility of natural periodicity in space (as opposed to periodicity in 
a list on paper) by a 1 in 17 systematic sample one must deal with the 
data in either of two ways. 

The first way is to omit ten of the strips listed after No. 187 (i.e. 
strips 188 to 197 inclusive), which amounts to omitting 24 + 10 = 34 
(fifth) strips of the forest or two whole “periods.” The systematic 1 
in 17 operation would then be performed on a list comprised of strip- 
values 1 to 187 plus 198 to 292. When this is done, there is no evidence 
of periodicity (Fig. 5B). 

The second way is to perform the operation independently on two 
lists, namely, strips 1 to 187 and strips 188 to 292. When this is done, 
the periodicity inherent in the original list (strips 1 to 292, Fig. 5A) 
reappears in each of its two parts (Figs. 5C and 5D). Now if there 
really is this natural periodicity in Dehra Dun, one is not surprised to 
find evidence of it in two different lengths of the forest. But since 187 
is an exact multiple of 17, i.e. strip 187 is the end of a “period,” and 
since strip 187 is separated from strip 188 by 1} ‘‘periods,”’ one would 
expect Fig. 5D to be low in the middle and high at the ends instead of 
high in the middle and low at the ends. This suggests that perhaps the 
periodicity is not in nature at all. The suggestion is strengthened 
when Figs. 5C and 5D are examined in detail. Natural spatial varia- 
tion always has an irregular component. Hence one does not expect a 
smooth or even completely consistent sample-to-sample rise from a 
to g or fall from h to q (Figs. 5C and 5D). But, equally, one would 
certainly never hope to find the pattern of irregularity (i.e. of unsmooth- 
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ness or inconsistency) in one part of Dehra Dun to be repeated in 
another part. Yet when one compares the portions 

d-e-f-g 

h-i-j-k-l 

in Fig. 5C with the same portions in Fig. 5D, it is quite clear that the 
general pattern of irregularity is identical practically throughout the 
two figures. This all leaves the present writer with the not unreasonable 
impression that the periodicity does not really exist in Dehra Dun but 
is a result of the mechanics of the enumeration. Unfortunately Sahai 
[1947] does not give sufficient information to settle the question. He 
does say, however, that the forest portions given to each enumerator 
in charge of a gang.were 4 X 1 miles; and it will be noted that half a 
“period” is approximately 1 mile. Again, his ““Note to Enumerators”’ 
says: ‘Every enumerator should start work from the strip nearest 
to his headquarters westwards. When the work becomes distant he 
should start work eastwards. When work on both sides becomes quite 
far he should shift camp to some other convenient place.” These facts 
from Sahai suggest possible ways in which a periodicity could have 
been produced in data from an area actually innocent of periodicity. 
But it would be a waste of time to state them. The Indian Forest 
Research Institute is alone in a position to settle the question—by 
post mortem on the mechanics of the original field sampling and by 
resampling. 

If the 8506 recording units could be accepted as a reliable reflection 
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of reality, would the CSS be a dangerous proposition in Dehra Dun? 
Fig. 6 is a hypothetical example of the kind of periodicity which is 
dangerous to two-dimensional systematic sampling. Unfortunately, 
owing to the irregular forest outline, the units cannot be “lined up” 
(as in Fig. 6) throughout the length of Dehra Dun or of either of its 
two parts. The longest section of the forest with approximately parallel 
sides is where strips 181 to 185 occur. These strips are ten units long. 
Table 3A shows the total cubic feet in each unit, and the irregularity 


TABLE 3 


Excerpt oF ENUMERATION Data FROM DEHRA DwN (see text) 


Strip A. Unit Values (Cu. Ft.) 
Number 


181 665 480 645 985 410 475 750 605 1540 1240 
182 515 1265 480 1190 855 845 1025 730 800 915 
183 695 1005 1810 755 560 825 465 1810 625 955 
184 810 715 1100 1150 945 890 830 960 885 805 
185 785 545 950 1130 1360 1085 1370 695 1140 1375 


B. Rises (+) and Falls (—) between Units in Adjacent Strips 


wi-2} —- + - + + + + + = = 
23/ + - + - = = = + = + 
+ - + + + + + == 
- - - + + + = + + 


of the values seems to be typical of the forest. Table 3B transforms 
the data in terms of Fig. 6. From the position of strips 181 to 185 in the 
“period,” the trend in each line of units, from strip to strip, should tend 
to be a consistent fall (—). Instead, there are altogether 21 rises (+) 
and 19 falls (—) in the ten lines. Clearly there is no sign of dangerous 
periodicity in strip-to-strip lines of units. 

Studies of spatial distributions and their environmental causes form 
a major part of ecological work. ‘Topographical features are among the 
grossest environmental factors. Nowhere in the wide world is there 
a periodicity in such features (see any atlas). As on the largest en- 
vironmental scale (e.g. the distribution of rivers over a continent) so 
it is the same right down to the smallest (e.g. soil moisture in a field). 
Indeed it is an ecological truism that irregularity is not only always 
present in natural environment but also almost always the dominant 
characteristic in the question of pattern. There are no reasonable 
grounds for expecting spatial periodicity anywhere on this earth except 
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where man himself, either directly or indirectly, has imposed periodic 
conditions sufficiently powerful to override the natural environmental 
irregularity. Thus, for example, one might expect periodicity for some 
soil property where land has been set up in broad regular ridges (primi- 
tive cultivation) or where trees have been planted at regular intervals 
(modern forestry, cf. Elton, [1949]), but certainly not in virgin prairie 
or natural forest. 

Populations 1 to 48 (all from farms, orchards, or gardens) show no 
sign of periodicity in their enumerations. This is not surprising since 
most human effort for cropping is applied with uniform, not periodic, 
intensity over land. In fact, apart from equal spacing of trees and of 
surface drainage or irrigation channels, no other human operations 
liable to produce periodicity come to mind. Spatial periodicity in the 
distribution of organisms must be a rather rare phenomenon. 


5.12 Suspected and unsuspected periodic variation 


The possibility of man-made spatial periodicity will nearly always 
be suspected either from external signs or past history. When suspected, 
the CSS can still be used without danger provided each unit is the total 
or average value of a series of equi-distant sub-units (“recording units’’) 
straddling the entire period on which it lies. Of course, suspicion may 
well be needless for a number of reasons, two of which must suffice: 
(i) The imposed periodic treatment might be vitiated by the intrinsic 
irregularity of the environment. (ii) The organism’s tolerance might 
be equal to or greater than the range of conditions in the period. The 
writer has some evidence for this in connection with the “primitive 
cultivation’ mentioned earlier and still showing on old permanent 
pastures in Britain: Gardenchafer grubs are always unaffected in their 
distribution by the ridging although leatherjacket grubs do exhibit a 
faint glimmer of periodicity in their tendency to be a little more numerous 
in the intervening “furrows” (moister conditions) than on the tops of 
the “ridges” (drier) in drought years. 

Finally there is the question of unsuspected periodicity. To be 
certain of a bad sample or wrong answer from CSS, the following five 
requirements must be met (remembering that there are two sets of 
sampling lines at right angles to each other): (a) one set of sampling 
lines must be in the same direction as the periodicity; (b) the sampling 
interval must be equal to the periodic interval, or some exact multiple 
of it; (c) the sampling points must fall near the “lows” (or “highs’’) 
of the periods; (d) periods must all have the same maximum and mini- 
mum value, or, alternatively, every period must be sampled; (e) there 
must be comparatively slight variation along any line at right angles to 
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the periodicity. ‘The chance must be small that the circumstances of 
the area and of the sampler’s blind choice of grid will dovetail to the 
extent that even one of the requirements (a), (b), and (c) will be ful- 
filled; still less the chance that two will be fulfilled at once; and positively 
minute the chance that all three will be fulfilled simultaneously. On 
top of that, it would be contrary to all field experience if requirements 
(d) and (e) were even closely approached. Deviations from any of the 
five requirements can be very large. But even small deviations will 
tend strongly to bring the sample into the overwhelmingly large group 
which must give the right answer (see beginning of §5). Unfortunately 
this cannot be tested on any field enumeration of two-dimensional 
periodic variation, because none has been published. However, in 
taking a 1 in 17 sample, the Dehra Dun data, because it is reduced to 
one dimension (a list of 292 strip-values), can be regarded as a hypo- 
thetical field case in which all the requirements for a wrong answer 
except one, namely (c), have been fulfilled for the CSS. The deviation 
from (c) could not be smaller because the sampling points for the 
centric (i.e. ninth) sample fall next to those for the “highs” as shown 
by the eighth sample being the peak value in Fig. 5A. Nevertheless 
the centric sample does not give the wrong answer when treated as 
random: (m + te) is 887 to 1316 while the true mean is 1028 (for strip 
values) or 1013 (for recording units). (Asa matter of interest, only three 
of the seventeen systematic samples do give a wrong answer.) From 
all these considerations, this writer believes that the danger to CSS 
from unsuspected periodic variation is so small as to be scarcely worth 
a thought. 


5.2 Correlation 


If spatial distribution were random (no correlation) the CSS (or 
any other sample) would be equivalent to a random sample. But, as 
is well-known, the production of plants and animals is never distributed 
entirely at random over any area. Yield depends largely on the im- 
mediate environment. Higher-yielding and lower-yielding sub-areas 
of differing extent are distributed irregularly over the area, grading 
into one another. The result is that values for adjacent units tend to 
be alike because units are usually considerably smaller than the sub- 
areas themselves. In other words there is positive correlation between 
neighbouring units, so if one value is high there will be a grouping of 
high values round it, and the values of more distant units must on the 
average be low. Accordingly, a systematically dispersed sample will 
tend to be made up of units more or less negatively correlated, and 
hence tend also to furnish a better mean and a smaller “standard error” 
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than is the case with a random sample. Actually the mean of n units 
would have a “standard error’? equal to the square roots of 


where p; is typically the correlation between two sampling units and is 
on the average negative, the area being finite, so that the “standard 
error” is less than ¢/*~/n. Thus the results of our experimental com- 
parison of CSS and random sampling (see Figs. 3 and 4) would be 
expected from the positive correlation of adjacent units and the virtual 
absence of dangerous periodicity in the field. (Incidentally, one or 
two of the enumeration authors did give statistical evidence of this 
positive correlation of adjacent units. In view of this, and of Fig. 4, 
the present author did not think that laborious correlation computations 
were necessary to prove negative correlation between CSS units.) 

If the correlation between adjacent units were negative, the mean 
from CSS would, of course, be poorer and the “‘standard error” greater 
than o/~V/n. It is true that a tree will not grow in the shade of a parent 
and that one cock bird will not permit another in its territory. But 
the sampling unit is always several (usually many) times greater than 
the area required by one individual, so negative correlation between 
adjacent units never enters into the question of field sampling. 


5.3 General conclusion 


Undoubtedly random sampling is an insurance against every 
eventuality in spatial distribution but only periodic variation need or 
could be dangerous.to CSS. Periodic variation does not occur in a 
state of nature, and rarely as a result of human activities. When 
occurring it will rarely be unsuspected, and only unsuspected periodicity 
need be a danger. When occurring unsuspected, it will rarely, if ever, 
be sufficiently perfect in pattern to be a danger. If occurring un- 
suspected and nearly perfect in pattern, there is still only a very small 
chance that the CSS will concur dangerously with the pattern. In 
short, the risk of periodic variation defeating the CSS is so very small 
that it can justifiably be ignored. 

The general conclusion may be stated thus: with proper caution, 
one will not go very far wrong, if wrong at all, in treating the centric 
systematic area-sample as if it were random. I hasten to add that this 
conclusion is strictly confined to the sampling of populations of organisms 
(or their individual characteristics) and of their environmental factors 
as distributed over land, on the surface or below (see Introduction). 
Moreover, I hope that this paper will not encourage anyone to believe 
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that the insistence of statisticians on random sampling can in general 
be ignored. 


6. SUMMARY 


The practical answer required from the sampling of biological 
populations distributed over land is stated. The centric systematic 
area-sample (CSS) is defined. Sampling from fifty complete enumer- 
ations showed that the CSS, analysed as if random, gave an answer 
as reliable and precise as a solitary random sample. From this, together 
with ample knowledge of spatial distributions in the field, the general 
conclusion is: with intelligent caution, one will not go very far wrong; 
if wrong at all, in analysing the CSS as if it were random. 
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APPENDIX I 
The Eight Cases of Inequality in Stratum Size (See §3.) 


Popln. 3: where 8 strata had 168 units each and 1 stratum 156. 
Popln. 6: where 4 strata had 60 units each and 1 stratum 48. 
Popln. 11: where 5 strata had 32 units each and 1 stratum 24. 
Popln. 15: where 6 strata had 140 units each and 1 stratum 160. 
Popln. 16: where 7 strata had 60 units each and 1 stratum 75. 
Popln. 18: where 4 strata had 70 units each and 1 stratum 84. 
Popln. 46: where 6 strata had 72 units each and 2 strata 60 each. 
Popln. 47: where 4 strata had 18 units each and 1 stratum 12. 


APPENDIX II 


Rules for Objective Determination of the Co-ordinates of the Centric Syste- 
matic Area-Sample (See §3.) 


Let p be the prescribed (i.e. desired) size of sample (as explained 
in §3). Let A be the total number of columns and B the total of rows 
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in the complete enumeration, B being equal to or greater than A. Also 
let the number of co-ordinate columns and rows be respectively a and b. 
Then a and b (both necessarily whole numbers) were decided as the best 
compromise between (i) ab = p and (ii) a/b = A/B and (iii) a is 2, 3, 
or 4 (preferably not 3) and B/b is as near as possible a whole number. 
If B is less than A, then b must be 2, 3, or 4 and A/a as near as possible 
a whole number. Having decided a and b, the actual columns and 
rows for co-ordinates were found as follows: Let A/a = x and B/b = y, 
then the columns are at 32, 1}a, 232, --- , and the rows at 3y, ly, 
2ky, --- , the values 0.0 to 0.9 representing the first column or row, 
1.0 to 1.9 the second column or row, 2.0 to 2.9 the third column or row, 

- and so on, the columns being numbered from left to right and the 
rows from top to bottom of the enumeration table irrespective of how 
the author of the table had done his numbering. The co-ordinates are 
all shown in Table 2. 
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SENSORY ITEM SORTING’ 


N. T. GrRIpGEMAN 


Division of Applied Biology, National Research Council 
Ottawa, Ontario, Canada 


Summary 


A probabilistic model for dichotomous sensory sorting is described; 
it covers many experimental designs including the simplest, pair com- 
parison. In a theoretical analysis of the probabilities involved, dis- 
tinction is drawn between the perception sua the characterization of an 
objective difference. In some circumstances » faculty of matching is 
postulated. Furthermore, if the contrasted st: :iuui differ in magnitude, 
the probability of correct ranking has to be invoked. Finally, there is 
the concept of a probability of preference, of interest for its own sake 
or as a means of testing discriminability. 


1. Consider a group of M items of which N possess a certain stable 
attribute that is objectively determinate but subjectively equivocal. 
We are asked to dichotomize the group, sorting off, by sensory percep- 
tion alone, an N-sized subgroup that we hope is entirely attributive. 
The sorting process will be a psychosensory mélange in which some items 
will be assayed more than once, but the outcome, the number R 
(0 < R < N) of correct allocations, may reasonably be expected to fit 
a statistical model. Define p as the probability of sensory perception 
(i.e., not of correct allocation, which includes chance selection) of the 
attribute in an item, and suppose that unidentified attribute items plus 
all non-attribute items are randomly allocated. This is tantamount 
to an assumption that the sorted-off subgroup is effectively made up 
of some identified items, some rightly there by lucky guess, and the 
remainder, (V — R), wrongly there by unlucky guess. (The word 
“effectively” is important; it is not of course suggested that there is 
necessarily an awareness of which are the identified and which the 
guessed-at items.) 

2. The term “attribute” is to be widely interpreted as any appropri- 
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ate characteristic—it may for instance simply be a higher stimulus 
intensity—and for this purpose is always to be regarded in the context 
of the non-attribute items (the probability of our identifying an orange 
among apples is not the same as that of identifying an orange among 
lemons). The term “identify” is used rather loosely; a more rigorous 
view will be taken later when the relevant faculties and probabilities 
come to be considered in detail. 

3. Translated into urn language, the situation demands one large 
urn of balls of which a proportion p is red and the rest are blue, and one 
smaller urn that contains exactly (AJ — N) white balls. Proceed as 
follows: (i) draw a random sample, size N (and containing, say, X reds), 
from the large urn; (ii) separate the (V — X) blues from the sample, 
add them to the small urn, and mix; (iii) draw (NV — X) balls at random 
from the small urn to restore the primary sample to size N. The 
number of non-white balls now in this sample is R, with the red balls 
representing identifications and the blue balls lucky guesses. 

4. We note that R may appear in any one of (R + 1) possible 
ways, depending on the number of reds and blues present. The prob- 
ability of any given combination will be the product of the Bernoulli 
binomial probability of the number of reds and the hypergeometric 
probability of the number of blues (for the third step is sampling without 
replacement from a 2-component universe). Symbolically, we have 


N\ x y-x \N — R/\M 2N+R 
R — = 1 
= (ux) (1) 
where q = 1 — p. As, clearly, the probability of the occurrence of R, 
regardless of composition, will be the sum of (1) for all X (R > X > 0), 
we finally obtain, for either non-white balls or correctly allocated items, 


— B/\M —2N.+R/ 


P(R; N) = 2 
M-—WN 
5. A working form of (2) is, 
N! (M —N)!/ 
= (M — 2N + R)! E | 3) 
(NV — X)! x 


_ N= 


“| 
sith, 
i 
| 
a 
ir 
wer 


300 BIOMETRICS, JUNE 1959 


an expression that simplifies in an obvious way for the common restric- 
tion N = M/2. Another special case is N = 1; then the probability 
that the single item is correctly chosen (either by discrimination or by 
guesswork) becomes, 


PR = 1)=p+q/M 


From (4) appropriate expressions for pair comparison (M =; 2) and 
the “triangle” test (M = 3) follow immediately [12]. In this sénse pair 
comparison is a limiting case of sorting, just as in other contexts it can 
be regarded as a limiting case of ranking or matching. 

6. The expected number of correct allocations (see Appendix) is, 


which enables a consistent estimate of the parameter p to be made from 
an observed R. Especially when the design is complex, trial-and-error 
substitution in (5) is easier than the setting up and solving of the Nth 
degree equation in p. If necessary, the result can be refined by maximi- 
zation of the likelihood. Estimation of the parameter may be needed 
for its own sake or to help check the fit of the model [10]. 

7. Expression (5), divided by N to give E(R/N), is monotonically 
decreasing in M and increasing in N with a minimum at M = 2. In 
other words pair comparison can be expected to yield a higher pro- 
portion of successes (for a given p) than any other homologous design. 
This fact is interesting, although it tells us nothing about efficiency 
(which is dependent on power) in hypothesis testing. 

8. Many examples of the use of sensory sorting for hypothesis 
testing occur in flavor technology, where the method is applied to the 
checking of paired batches of a product for organoleptic uniformity, 
i.e. for Hy (p = 0). Rejection criteria stem from the probabilities of 
R, , the number of correct or specified decisions in n replicate trials, on 
the null hypothesis. Some of these probabilities have to be computed, 
and some are available. Roessler et al. [16] have tabulated 1- and 2-tail 
significance criteria at a = 0.05, 0.01, and 0.001, for M = 2 and 3, 
N = (necessarily) 1, and nm = 5(1)50(10)100. For many nN < nM/2, 
Mainland’s binomial tables [14] for 1- and 2-tail work at a = 0.05 and 
0.01 are useful. 

9. The relative efficiency of the various designs cannot be determined 
by theory alone. As the basis of cost will be the number of items to be 
sensorily assayed in one test, the question becomes: “For a specific 
attribute and a given nM, what values of n and N are mosu likely to 
lead to a rejection of H,?” Calculation of power curves for several nM 


N 72 F 
N\ N° + X(M — 2N) x w-x 
| E(R) = ( ) N* + X(M — 2N) 
(R) = 2 (5) 
i 


SENSORY ITEM SORTING 301 


up to 64, with JJ > 8, indicates a comparatively small power range 
among the various designs, and no noticeable trends. As, moreover, 
we cannot assume that, for a given subject (or group of subjects), a 
constant p will apply to all possible designs, these calculations will not 
be reproduced here. There is evidence, incidentally, [8, 10, 12] that the 
simpler the design, the greater the value of p, so that simplicity is 
normally advantageous. 

10. The psychosensory genesis of p may be complex. In principle, 
sensations can be fully described in terms of quality, intensity, and 
hedonics, and they can be partially quantified in terms of arbitrary 
scoring scales laid against the second and third of these characteristics. 
Possible kinds or components of p may be systematized as follows: 


~p. = primary probability of sensory discrimination of the difference. 

Pa = secondary probability (if p, > 0) of specific selection, sub- 
divisible into p,. for quantitative characterization, and p,, for 
intensity ranking. 

Px = secondary probability (if p, > 0) of preference (hedonics). 


There may conceivably be a small difference for which p, * 0 while 
Pp. = 0 or p, = 0; that is, the subject detects a difference yet is unable 
to specify it or to have a preference across it; and experimentalists have 
in fact found evidence for this [8, 15]. Of the three secondary prob- 
abilities, p,. is most susceptible to improvement by memory and training. 
By contrast, p.- is generated against the background of a sensory 
intensity continuum of reasonable stability. And p, may be regarded 
[9] as based on a hedonic continuum that, although not necessarily 
stable, cannot meaningfully be “improved.” 

11. Returning to binary language and attributiveness, we may 
symbolize the difference between the two stimuli as A (A, B), where A 
is the attribute and B = not-A. Then if p,, is relevant, the sort of 
question will be, ““‘Which N are normal (or synthetic, or foreign, or 
healthy, or sweet, or ---)?” And if p., , “Which N are bigger (or 
louder, or brighter, or harder, or sweeter, or ---)?”’ These are par- 
ticularizations of the generic question, ‘‘Which N are A?”, whose 
response uncertainty gives rise to the probabilities p, and/or p,. Note 
that if N * M/2 there is likely to be a “confounding” of p, with the 
subgroup size difference (and in the limit, when N is unity, p, becomes 
inoperative because the subject merely seeks the odd item among the 
M). If p, is at issue, it is particularly advisable to have N = M/2; the 
question then is, ‘‘Which N do you prefer?”’, and R is defined as the 
larger number of similar items in either sorted-off subgroup, and assess- 
ment is 2-tail. 
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12. In all trials (no matter What the form of the question) for 
which M > 2, an element of matching will enter the sorting process— 
stronger in the physical senses (especially vision), in which several 
items can be perceived more or less simultaneously, than with the 
chemical senses. It may be that a probability of matching, p,, , is a 
component of the sorting probability complex—a specific matching 
faculty can certainly be demonstrated, and a corresponding probability 
can be estimated (see paragraph 15, below). In replicate trials we 
assume no inter-subgroup matching; nevertheless memory plays a part, 
and the subject will tend to label A consistently, although exhibiting 
no improved identification. 

13. The secondary probabilities have a directional aspect because 
of the possibility of recognition of A (A, B) coupled with a mistaking 
of A for B and vice versa. Misdirection (perversity) is more likely with 
Pa. than with p,, (and has no meaning with p,). This suggests the 
replacement of p, by a pair of alleloprobabilities: p.,,, for the correct, 
and p,:-) for the wrong, direction. If for instance we ask a subject 
to select the 5 new items from a group of 10, and he promptly picks out 
the 5 old ones, this action is more likely to have arisen from a mis- 
identification of newness (i.e., pa:-) & 1) than from absence of identi- 
fication (i.e., p, = 0). These considerations influence the -choice 
between 1- and 2-tail probabilities in significance assessment. 

14. With the reservations implicit in paragraph 11 (above), most 
of what we say about p, also applies to p, . Incidentally, preference 
questions can be used to test the hypothesis that A (A, B) = 0, because 
a non-random set of answers is evidence that p, ¥ 0 (although of course 
a random set of answers lends no support to the hypothesis that p, = 0). 
A practical application of this fact has been described by Gray [6]. 

15. The basic pair comparison design, or modifications of it, can 
be used to estimate the various empirical probabilities discussed above. 
The standard type of pair comparison, which takes the question, 
“Which member of the pair is --- er?”’,. provides frequency estimates 
of ae , Par , OF P, (NO matching is involved here or any seeking of a 
sensory difference per se, because the difference is given; it is the pair 
split itself). To estimate p, we present the subject with 8 items as 4 
coded pairs, AA, AB, BA, and BB, in random order, and we ask him, 
“Two of the pairs are composed of identical, and the others of dissimilar, 
items; which pairs are which?’ Only 1-tail probabilities are needed, 
for it is absurd to suppose that the two types of pairs could be mis- 
identified. Finally, to estimate p,, we present the coded AB plus an 
extra labeled A and ask, ‘‘Which member of the pair matches the 
standard?” Presumably this is the same when a labeled B is made the 
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standard; in practice both should be used alternately. A variant is to 
supply the two standards together. 

16. Analysis of M > 2 designs in terms of the specific probabilities 
is seldom easy. In the well-known “triangle test” (M = 3; N = 1; 
question, “‘Which member of the triad is odd?’’), for instance, a blend 
of discrimination (p,) and matching (p,,) is involved, but it is not obvious 
how to formulate the blend. 

17. In all sorting designs judgment is subject to a positional or 
time-order bias [11]. This must be taken care of, either by appraisal 
of the M items in all permutations, or by the use of a random selection 
of the permutations, or by the use of a suitably balanced design (see, for 
instance, Ferris’s test designs [4]). 

18. We have so far assumed that replication is ‘‘true,” i.e., that n 
samples with a common probability p are drawn, but this is an ideal not 
commonly realized. If the replications are shared among several 
subjects, p is unlikely to be homogeneous. Heterogeneity will increase 
or decrease the risk of an error of the second kind (i.e., of the mistaken 
acceptance of the null hypothesis) according to whether the mean p 
is, respectively, less or more than the p’ = {(R/) of the rejection criterion. 
This is because samples from a Bernoulli-binomial universe are more 
variable than those from the corresponding Poisson-binomial universe 
[13]. Statistically speaking, the point is of slight import; however it 
is to be noted that sometimes, and especially if p, is at issue, heter- 
ogeneity may be desirable because the experimentalist wants his results 
to represent some population of subjects. 

19. Dichotomous sorting is a special case of polytomous sorting. 
But if the number of subgroups exceeds two, complexities mount, 
additional parameters are required, and assumptive difficulties arise. 
An obvious complication is that if the subject fails to identify an item, 
he is not equally likely to assign it to any one of the incorrect sub- 
groups. Solomon [17] has discussed certain of the relevant problems. 
Nevertheless, provided that our interest goes no further than the 
testing of the null hypothesis that “there is no perceptible difference 
among the S different kinds of items,” some headway can be made. If 
there are exactly M/S identical items in each subgroup, and the subject 
is asked to sort accordingly, the total number H of correct allocations 
can be regarded as the number of correct matchings, this word being 
now used in its mathematical rather than its perceptual sense. Gilbert 
[5], utilizing results of Greville [7], has tabulated exact cumulative 
frequencies of random matchings for 44 sets of conditions up to M = 25 
(his s is our S, his ce our M/S, and his h our H). In the same paper 
Gilbert has also shown how some formulas developed by Battin [1] 
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can be used to test the null hypothesis (of random allocation) when the 
suit sizes differ in matched equal-sized decks of cards. This situation 
corresponds, in the field of sensory perception, to the subject’s being 
unaware of the sizes of the subgroups. This type of test has been 
considered at least once [2] in the literature of flavor testing. 


APPENDIX 
The mean and variance of the sorting distribution 


The rth moment of the frequency distribution characterized by ex- 
pression (3), paragraph 5, is, by definition, 


N 
ue = E(R’) = NI[(M — N)!) 
(N — X)! X_N-X 
which can be rearranged to 
N 
(N — X)! 
* 
If we now put Y = N — R, the R-summation in (8) becomes 
N-X (N Y)’ 
N -X ) 
(M — X)! yy AN 
N -X 
Incorporation of (10) in (8) yields, 
~ N-X ) 
N -X 


and the problem now is to eliminate the Y-summations. To do this we 
can modify the arrangement so as to be able to make use of the general 


identity 
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Forr = 1 andr = 2 thesolutions are straightforward. Writing Z = Y — 
1 = N — R — 1, we have, 


_ N-X ) 
N-X 
Y N-X-Y. 


_ (M = NYN nrg (12) 


(M — X) / 
= (M — N)(N — X)/(M — X). 
Similarly, 
Y -xX-Y 


_ -MW- xP rs Z 
(M—X)(M—X—1) 
—~X-1 
= [((M — — — X\(M X 2). 


Therefore the first moment is, 


(N (M — N\(N — X)\ x 
K(R) = >> q 
X=0 ) (M — X) (14) 
N? + X(M = 2N) 
(M — X) 
= N* Jan X)""p*q"-* when M=2N. (15) 
Likewise, the second moment is, 
(N (M — N)(N — X) 
E(R’) = —2N 
xX=0 (M — X) (16) 


N)\(N — X)f 


X=0 
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The variance can then of course be obtained from the relation V(R) = 
E(R’) — E’(R). 
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ORTHOGONAL CONTRASTS IN SLOPE RATIO 
INVESTIGATIONS 


P. J. CLARINGBOLD 


Department of Veterinary Physiology, University of Sydney 
Sydney, Australia 


I. INTRODUCTION 


It is customary in the analysis of variance of balanced experiments 
to partition sums of squares into mean squares corresponding with 
linear, quadratic, and other components of regression of response on 
the various independent variables. Undoubtedly this is a reasonable 
approach if the overall form of the regression line is the object of study, 
but it is a useless approach if some special aspect of the curve is being 
studied, for example, the limits of an interval of linear regression. In 
this instance Lorraine [1952] devised a set of orthogonal contrasts, the 
first of which tested the linearity of three points situated at one extreme 
of the factor levels. Successive contrasts then tested whether the 
addition of further points, one per contrast, to those already tested, 
introduced significant departure from linearity. If no contrast was 
found significant (i.e., n — 2 tests of linearity, the factor applied at n 
levels), the contrasts were completed by the standard linear contrast 
over n levels. 

The problem of appropriate sets of orthogonal contrasts arises in an 
acute form when factor/response lines with only one parameter are 
encountered. Two cases are considered; in both the factor/response 
lines form a pencil of lines with common point of intersection on the 
response axis. In one the intersection point is the origin (response = 0) 
and in the other the intersection point is on the response axis above the 
origin (response > 0). Conventional description of these regression 
lines with a mean value and a slope leads to loss of efficiency and elegance 
since the parameters are in constant ratio. Experimental data of this 
type are obtained in chemical assays where the basic parameter is 
optical density per unit weight, in selection experiments where advance 
per generation is estimated, and in slope ratio assays. Some novel sets 
of contrasts are introduced in this paper in order to deal with the 
problem. 
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II. NOTATION AND ASSUMPTIONS 


Orthogonal contrasts: Bold type is used to denote contrasts and where 
possible a descriptive symbol is employed, e.g., Sl for slope. In the 
past, two types of orthogonal contrast have been employed, the con- 
trasts forming the lower rows of orthogonal matrices (apart from 
arbitrary scalar row multipliers), the first row of which is made of 
either ones, (e.g., second matrix of Table 5) or weights proportional to 
sample sizes [Fisher, 1938]. This definition of the elements of the 
first row together with the orthogonality restriction ensures that the 
proper contrasts have zero expectations in the null case. The term 
contrast is extended in this paper to cover cases where the expectation 
of observations, or of observation differences, or more generally, linear 
combinations of observations, are directly proportional to certain con- 
stants, defined in equation (12) below. These constants therefore form 
the first row of a matrix and the remaining rows are chosen, with the 
restriction of orthogonality, as before. 

Regression coefficients: On the occasions where the regression co- 
efficients are explicitly referred to, the symbol b (_) is used, the brackets 
containing the symbol for the contrast, e.g., b (SI) stands for the mean 
slope per unit test material. 

Elements of contrasts: Conventions similar to those of Fisher and 
Yates (1955, Table XXIII] are adhered to. Thus é or k stand for 
elements, \ is a scale factor, and r the replication factor for the contrast 
concerned. The mean (m) of all observations corresponding with a 
particular value of é or k is given by t/r where ¢ is the appropriate total. 

Statistical analysis: The orthogonal linear model [Kempthorne, 
1952] is employed throughout, experimental error being assumed in- 
dependent of the treatment combinations. With the conventions 
described above, the mean square (MS) for a particular contrast is 
given by: 


MS () = (1) 

and the corresponding regression coefficient and its variance, by: 
b( ) = AZk-t/rzk’, (2) 
= VV (3) 


where the sign >> stands for summation over values of k or £ and the 
corresponding value of ¢; and V is the error variance. 

Direct product: It is also assumed that orthogonal experiments of 
the factorial type (N factor) are used. In this case the contrasts for 
each factor may be used to generate contrasts for the whole experiment 
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by means of the direct product of the matrices of contrasts. It is, of 
course, assumed that appropriate rows and columns of the product 
matrix are deleted in the case of fractional replications and Latin square 
designs. The direct product is defined, [cf. Murnaghan, 1938]: 


K-IIK., (4) 


t=1 


where an element of K, 
(si) i 
kits) = I] Kit, (5) 
1 


is the product of elements of the N matrices K; , selected so that the 
row defining and column defining integer sets of (5), namely (s;) and 
(t;), take the required values. The integers s; and ¢; are residues mod (n,), 
i.e., the 7th factor is applied at n, levels, giving a matrix of contrasts 
of order n; . For example: 


Let 
—1 1 —2 1 
[7,00 00 00 007 
00 01 10 11 
01 01 01 01 
Then K = 00 01 10 11 (7) 
10 10 10 10 
00 01 10 11 
11 11 11 
00 01 10 11— 
—2 1 -2 1 
-2 
2 -1 -2 IL 
where 
o= kh, Xk, =1X1 


= 1X2, ete. 


The direct product (otherwise known as the Kronecker product) 
has been used in a number of statistical investigations: in the partition 
of x° (Lancaster, 1951], in multiple regression analysis [Cornish, 1957], 
and in the analysis of block experiments [Tocher, 1952]; and is, of 
course, implied in Yates’ [1937] treatment of interaction single degrees 
of freedom in the analysis of variance. The operation is thus of central 
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importance ii generating interaction contrasts from contrasts used in 
: the first instance to partition main effect sums of squares or x”. It is 

defined ab initio in the present paper since the form used, a form very 
7 suitable for use in the analysis of experiments, is not given elsewhere. 


Ill. EXAMPLES 
1. Selection experiment 


During the past six years a selection experiment has been carried 
out in this laboratory [for details see Biggers and Claringbold, 1955]. 
a In brief, from an initial population of randomly bred albino mice, two 
lines are selected for increased (reduced) sensitivity to the local adminis- 
tration of oestrogens. A constant selection intensity in both up and 
down lines is applied in each generation, the animals of a subsequent 
generation being derived from the 33 per cent of parental mice whose 
daughters scored highest or lowest in a standard series of oestrogen 
tests. The standard dose series consisted of four doses given by the 
7 equation a-2° where b = — 1.5, — 0.5, 0.5 and 1.5, that is, integrally 
“ spaced on the log, dose scale, the constant a being chosen as the ex- 
: pected median effective dose of the generation and line under test. Since 
each mouse responds quantally in each test, the number of positive 
responses gives a score, the mean of a set of such scores giving the 
parents a score. Selection via the parents must be employed since the 
test mice are ovariectomised females. 


TABLE 1 

EXPERIMENTAL RESULTS AND Basic CALCULATIONS FOR THE First E1cut 
yt GENERATIONS OF SELECTION 

i| 

| loge Cumulative Linearity 

Generation | Relative | Weight totals test 
(3) sensitivity | (w) 


0.438 12.11 0.1918 0.438 0.266 
0.532 11.89 0.4749 1.502 3.228 
1.937 11.66 4.2268 7.313 6.818 
1.948 13.14 8.0215 15.105 9.876 
1.646 11.18 | 10.7309 23.335 20.918 
2.282 13.82 | 15.9384 37.027 35.087 
2.699 12.42 | 23.2230 55.920 59.587 
3.405 11.57 | 34.8170 83.160 


w = 12.22, d; defined in Table 2. 
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Although the mean log-sensitivity of both selected lines varies with 
generations, these changes run parallel with those in a randomly mated 
control colony. ‘The results of the first eight generations of selection 
may therefore be summarised (Table 1) in one log-relative sensitivity 
figure, i.e., the difference between the two log-mean sensitivity figures 
per generation. The variance of this estimate depends on the factors 
which govern the precision of a quantal bioassay such as, current 
dose/response slope, number of animals tested, and the degree of 
success in predicting the dose series to use. ‘The reciprocal of this 
variance is termed the within-generation weight. For a more complete 
discussion of the biometrical aspects of this investigation see Biggers 
[1951]. 

Owing to technical limitations it is impossible to replicate the above 
programme. As a result the only estimate of the between-generation 
variance is the line X generation interaction, and is thus confounded 
with high-order curvature in advance due to selection. 

The selection results appear by generations and as a result a method 
of analysis is required which (i) gives the best estimate of the advance, 
the change in log-mean sensitivity per selective increment, and (ii) 
indicates whether the results of the jth generation, j > 1, are collinear 


TABLE 2 
OrTHOGONAL Contrasts USED IN THE ANALYSIS OF VARIANCE OF THE DATA OF 
TABLE 1 
(Tue Contrasts Form THE Rows oF AN 8 X 8 Matrix.) 
Name Contrast 
j 1 2 3 4 5 6 i 8 
1, 1 —1/2 0 0 0 0 0 0 
l, 1 2 —5/3 0 0 0 0 0 
1; 1 2 3 —14/4 0 0 0 0 
I, 1 2 3 4 -—30/5 0 0 0 
1, 1 2 3 4 5 —55/6 0 0 
l, 1 2 3 4 5 6 -—91/7 0 
1, 1 2 a. 4 5 6 7 —140/8 


d;, the (j + 1)th diagonal element of the matrix, is given by —= q@?/(j + 1), e.g. —ds = (1 + 
4+9)/(3 +1). 


with those preceding. In Table 2 a set of contrasts is defined for this 
purpose. It should be noted that the first row estimates 2a when 
applied to relative sensitivity figures, and it is expected that in the 
simplest genetic situation with additivity (on the log scale) the relative 
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sensitivity at the jth generation should be 2ja. The remaining rows 
have expectation zero with this genetic model. Since each successive 
test of linearity is independent of the others, little effort is required to 
bring the analysis up to date when another set of results appears. The 
analysis of variance and estimation of the regression coefficients are 
completed in the standard way using equations (1) and (2), Table 3. 


TABLE 3 


ANALYSIS OF VARIANCE OF THE Data oF TABLE 1 USING THE ORTHOGONAL 
ContTRASTS OF TABLE 2 


Source of D.F. Sum of Sumofsquared Mean 
variation products elements Square 
Advance (j) 1 83.160 204.00 33.900 
Departures from 
linearity (7) (0.131) 
1, 1 0.172 1.25 0.024 
1, 1 —1.726 7.78 0.383 
1; 1 0.495 26.25 0.009 
I, 1 5.229 66.00 0.414 
1; 1 2.417 139.03 0.042 
I 1 1.940 260.00 0.015 
1 —3.667 446.25 0.030 
Within-generation 1400 0.0821 
variance (approx. ) 


The cumulative totals indicated in Table 1 facilitate evaluation of equa- 
tion (1) at each stage. For ease in analysis a constant weight is assumed, 
the mean weight being used, and since the variation in weight is small, 
the overall loss of information is slight. 

Although no direct: test of significance is possible, it is obvious that 
the response to selection has been linear. If the response had been a 
simple quadratic curve, the departure from linearity would take the 
same sign and increase in magnitude. The seven mean squares for 
departures from linearity are homogeneous, x{,, = 7.20;0.5 > P > 0.3 
and their average is not significant when compared with the within- 
generation variance, F (7, 1400) = 1.60;0.2 > P > 0.1. 

The best estimate of the increase in relative sensitivity per genera- 
tion of selection (2a) is: 


Sa = 83.160/204 = 0.4076, 
with variance: 
Va) = 0.131/204 = 0.000 6422, 
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using the deviations from linearity as error, and 95 per cent fiducial 
limits are constructed in the usual way: 


& = 0.2038(0.1738 — 0.2338). 
2. Chemical experiment 


The colorimetric reaction of the natural oestrogens with sulphuric 
acid has been studied in this department. I am indebted to Dr. R. I. : 
Cox for permission to use the unpublished results of an experiment ' 
(Table 4) as illustrative material. The purposes of the experiment 


TABLE 4 


Tue EFFECT OF p-BENZOQUINONE AND AciD CONCENTRATION ON THE WEIGHT/ 
OpticaL Density LINE OBTAINED WITH OESTRIOL 


Optical density at | 
Factor levels 512.5myp | Contrasts 
p-benzo- Ex- 
quinone | Acid | Weight | Observed | pected | + | — | 
(m) (n) | (2.25 (10-3) (10-3) Sl Bl Cu 
| 
0 | 2 | 1 77 76 77.8 | 153 1 
| 2 149 153 | 151.2 | 302 4 
| 3 227 221 | 224.5 | 448 6 | 2101 18 | —3 
3 1 78:0'| 165. 1 
2 162 161 | 160.5 | 323 1 
3 243 241 | 243.0 | 484 2 | 2253 | —25 | —7 
4 1 St 8f | 82.2) 8 
2 166 154 | 168.7 | 331 1 
3 254 256 | 255.3 | 510 2 | 2363 —-5 | 19 
5 1 87 90 90.4 | 177 3 
2 178 176.0 | 352 4 
3 266 257 | 261.6 | 523 9 | 2450 14 |} —4 
1 2 1 51 51 51.6 | 102 0 
127 132 | 129.3 | 259 5 
3 206 212 | 207.1 | 418 6 | 1874 |—169 2 
3 1 74 74.7} 149 1 
2 164 157 | 159.2 | 321 7 
3 240 242 | 243.6 | 482 2 | 2237 | —47 |—11 
4 1 90 87.6} 180 0 
2 174 183 | 177.5 | 357 9 
3 270 269 | 267.4 | 539 1 | 2511 -1 5 
5 1 88 90.2 | 180 4 
2 180 184 | 184.3 | 364 4 
3 281 279 | 278.4 | 560 2 | 2588 | —36| 12 


m defines levels, 200m mg/l p-benzoquinone + 20 g/l p-hydroquinone, n defines levels, 20n ml 
H:0 + 190 ml H2SO.. 
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were (i) to determine conditions under which “blank” was zero, i.e., 
conditions under which the weight/optical density lines passed through 
the origin and (ii) to determine conditions (subject to i) giving maximum 
optical density per microgram test substance. A summary of the ex- 
perimental conditions is given at the foot of Table 4, the experiment 
is of the 2 X 4 X 3 type with two independent observations per treat- 
ment combination. 


TABLE 5 
ANALYSIS OF VARIANCE OF THE Data OF TABLE 4 UsING THE Direct Propuct 
Noration* 
Sum of 
Source of D.F. | Sumof | elements Mean 
variation products | squared square 
Slopes (8) 
Sl 1 18 377 224 150 7652.4 
Sl < Az, 1 3 573 1120 1 1398.5*** 
Sl X Ag 1 —351 224 550 .0*** 
Sl X Ac 1 —89 1120 | 
Sl x Q 1 43 224 8.3 
Sl x Ar X Q 1 1 259 1120 1415 .3*** 
SI X Ag X Q 1 —221 224 218 .9°** 
SIX AcXQ| 1 —127 1120 14.4 
Blanks (8) 
Bl 1 —251 336 187 .5*** 
Bl X A, 1 453 1680 1223.2°°" 
BI X Ag 1 —95 336 26.9 
BI X Ac 1 69 1680 2.8 
Bl xX Q 1 —255 336 195 .3*** 
Bl X A, X Q 1 437 1680 113 .7°° 
Bl X Ag X Q 1 —219 336 142. 7°°* 
BI X Ac X Q 1 59 1680 2.1 
Curvatures 8 7.6 
Between 
duplicates 24 | 8.58 
* 1 ae | 11 
(*) AL ( 2 | i: ( 1 
Ac 
**0.01 > P > 0.001. 
EP < 0.001. 
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A matrix of contrasts suitable for examining the features of the 
weight/optical density lines is given, using equations (11) and (14) 
below: 


Slope Sl k, 1 2 3 
Blank 1 (8) 
Curvature _Cu = 1 -2 1 


Regression of optical density on the constants of the first row, in the 
absence of significant blank and curvature, gives the best estimate of 
optical density per unit weight. Use of the contrast — 1, 0, 1 instead 
of 1, 2, 3 under these conditions results in precision being reduced from 
100 per cent to 14.3 per cent, a figure computed using the ratio of the 
sum of squared coefficients, see equation (3), i.e., (2/14) X 100 per cent. 

The initial stages of the analysis are also shown in Table 4. The 
columns headed + and | — | give the sum and the absolute value. of the 
difference of duplicate observations. Half the sum of squares of the 
duplicate differences gives the estimate of experimental error used in 


> ACID 
200- 
(a) 
< 
U 
100- o QUINONE 
a 
Vj 
/ 
2 2 3 
WEIGHT 
FIGURE 1. 


THE RELATION oF Expectep OpticaL Density To Acip DILUTION AND PRESENCE 

oF ADDED p-BENZOQUINONE UNITs AS IN TABLE 4. THE EXPECTATIONS PLOTTED 

ARE OBTAINED FROM THE REGRESSION COEFFICIENTS CORRESPONDING WITH 
SIGNIFICANT ITEMS IN THE ANALYSIS OF VARIANCE (TABLE 5). 
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Table 5. The remaining columns are obtained by forming the scalar 
products of the duplicate sums with the coefficients of the contrasts: 


e.g., 2101 = (1 X 153) + (2 X 302) + (3 X 448). 


Differences between these scalar products may be examined using the 
standard orthogonal polynomial contrasts [Fisher and Yates, 1955]. 

In accordance with expectation no significant evidence of curvature 
in the weight/optical density lines is found. A number of significant 
blank differences is observed and these are readily interpreted in the 
light of Fig. 1 or the relevant sums in Table 4. Clearly blank is strongly 
non-linearly determined by acid concentration in the presence of added 
p-benzoquinone (Q = 1) while little or no blank is found in the absence 
of this substance (Q = 0). Estimates of slope using the coefficients 
1, 2, 3 will thus be unbiased only where Q = 0, where it is found that 
maximum optical density per microgram may be developed under 
conditions where the acid dilution is greater than in the present in- 
vestigation. 


3. Slope ratio assay 


In some experimental investigations it is found that dose/response 
lines obtained under different conditions all appear to radiate from a 
non-zero intercept on the response axis. As an illustration the slope 
ratio assay [Finney, 1952, p. 206] is chosen. In the analysis of the slope 
ratio assay Finney uses a pair of mutually non-orthogonal contrasts to 


TABLE 6 


ORTHOGONAL Contrasts UsED IN THE ANALYSIS OF THE (2 X 2) + 1 AND THE 
2 X 2 Store Ratio EXPERIMENT WITH NON-ZERO BLANK 


Assay type 
(2X 2)4+1 | 2x2 
Dose 0 1 2 M 2M | 1 2 M 2M 
Divergence (Di)| 0 —2 1 2 |-1 -2 1 2 
Intersection 0 2-1 -2 1 2 -1 -2 1 
Blank —2 2 -1 2 -1 Not available 
Totals 167 400 646 340 489 


N.B. Finney [1952] uses contrasts defined by 2.5 SI] + 3.5 Di in place of contrasts SI and Di. 
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estimate separately the slope of each regression line. The ratio of these 
estimates gives the desired relative potency estimate. 

In investigations more general than an assay where a pencil of lines 
is studied, a test of significance of slope differences is all that is usually 
required. With many lines, separate fitting of a regression coefficient 
for each line by multiple regression methods is time consuming owing 
to non-orthogonality of the coefficients, although contrasts amongst 
the slope differences are orthogonal. In the example a matrix of com- 
parisons suitable for examining the differences between the five group 
totals of the (2 X 2) + 1 design is given in Table 6. Comparison with 
linney’s table shows that the mutually non-orthogonal contrasts are 
replaced by orthogonal contrasts, one of which estimates the mean 
slope of the regression lines and the other the difference between the 
slopes termed the divergence. 

It is seen that only three contrasts (including the mean) involve the 
control total. Inclusion of the control increases the precision of the 
mean slope and provides a test termed “blank”’ of the collinearity of 
the control point and the mean regression line. In experiments where 
the control point is omitted, the blank test is lost together with some 
precision in the estimate of mean slope. It should be noted that the 
control point is without effect on the precision of slope differences. The 
analysis of the data, summarised as group totals at the foot of Table 6, 


TABLE 7 
ANALYSIS OF THE Data OF FINNEY [1952, p. 206] UstInG THE CONTRASTS OF 
TABLE 6 
i — 
Source of | D.F. | Sumof |r = Fk Mean 
variation products | square 
Mean 1 | - | eee 20 | 20 8488.2 
Slope |} 1 | 5 | 2798 280 2 7960.0 
Divergence —374 40 3496.9 
Intersection } 2} — —37 40 34.2 
Within-group error 15 | | 


Estimation of relative potency (M): M = [b(SD + 6(Di]/[b(SD — b(Di)], (SI) = 5 X 2798/280 = 
49.96, (Di) = —374/40 = —9.35. Thus M = 0.6847. 


is completed in Table 7, and in addition the slope ratio is estimated to 
show the connection between the present approach and that of Finney 
[1952]. 
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It may be noted that the 4 X 4 matrix for the 2 X 2 slope ratio 


assay is composed of two pairs of rows obtained from two different 
direct products: 


IV. SOME GENERAL CONSTRASTS 


One matrix of general contrasts is given in Table 2. While this 
matrix is ideal for testing the collinearity of sequentially added data, 
assumed proportional to the natural numbers, it is not of use generally. 
Since all of the dose/response lines considered in this paper are linear, 
the only rows of the matrices of orthogonal polynomial coefficients 
which need to be modified for use in slope ratio investigations are the 


first two, namely: 
1, (9) 
ti = (23 — n — 1)/2, j = 1 through n. (10) 


In the case of dose/response lines passing through the origin, the 
contrast (9) is replaced by: 


ki = j. (11) 
Experiments are often performed, however, where the dose levels are 
given by the equation: 


ki 1), i.e., slope, (12) 
giving the series of doses: 


1,z, 22 — 1,32 — 2,42 — 3, andsoon. (13) 


Thus the lowest dose is assumed unit distance from the origin and the 
subsequent dose(s) are equally spaced using the interval (z — 1). 
Clearly (11) is a special case of (12) when z = 2. 

Making the transformation from £ to ky requires a transformation 


of £, to k, to preserve orthogonality of the contrasts. This is found to 
give: 


ki =1—(j— Up, ive., blank, (14) 
where p = a/b, 
a = 3[n — 1) — @ — 3)] (15) 
b = 2n°(z — 1) — 3n(z — 2) + (e — 4). 
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CONSTRUCTION OF ORTHOGONAL CONTRASTS IN THE (3 X 3) + 1 AND THE 3 X 3 
EXPERIMENTS FROM THE Direct Propuct* 


Treatment 
Contrast | 0 1 2 
0 1 3 5 1 3 5 1 3 5 
I Sl - 1 3 5 1 3 5 1 3 5 
I Bl —| 13 4 13 4 13 4 -5 
I Cu = 1 —-2 1 1 -2 1 1 -2 1 
| 
A,Bl -—|-13 -4 5 | 13 4 0 0 0 
A,Cu —|-1 1 0 0 0 
3 5 |-2 -6 -10 1 3 5 
AgBl 13 — 10 13 4 
A —2 1 | -2 4 -2 1 -2 1 
Modifications: 
Delete first two rows above and replace by following rows. 
(3 X 3) + 1 experiment 
Slope —27,-17 3 23) |-17 3 23 |-17 3 23 
Blank |—36, 13 4 -5 | 18 4 13 4. 
3 X 3 experiment 
Mean | — | 1 ie. 1 1 1 1 
Slope 0 1 0 1 
— — | 
Ay, BI © a] 4.-—3 
Ag} (Cu 1 -2 1 i-2 1 


z = 3, and dos2, response lines are exp2cted to radiate from a non-zero intercept. 
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When z = 2: 
p = 3/2(n — 1). (16) 


In Table 8 blank contrasts (k,) are given for various values of the 
parameters z and n. 
Both slope and blank contrasts are orthogonal to the remaining 


curvature terms of the matrices, as is obvious from (13) and (14), and 
we let: 


k; for j > i (17) 


The terms divergence and intersection are reserved for linear combi- 
nations of slopes and blanks with coefficients summing to zero. Inter- 
section is used since contrasts so constructed test whether dose/response 
lines intersect on the response axis. Divergence is used since it is simply 
related to the angle of divergence of the dose/response lines as they 
radiate from a common intersection point. The sum of the mean slope 
of all dose/response lines and a divergence coefficient is the tangent 
of the angle between the dose axis and the dose/response line specified. 

In the general experiment of either the (sn + 1) or (sn) design, i.e., 
s dose/response lines each of n points, where the lines are expected to 
radiate from a common intercept, the direct product does not give the 
desired contrasts for calculation of mean slope and blank. The modi- 
fications required are: 


(¢) Mean slope: (ns + 1) design: 


Subtract ¢ = ns [2 + (n — 1) (z — 1)]/2(ns + 1) from the control 
value (0) and the contrast k, which is, of course, repeated s times. 


(iz) Mean slope: (ns) design: 

Use s repetitions of the contrast &, applicable to n levels. 

(wiz) Blank test: (ns + 1) design only: 

The contrast has the value — s>—k, at the control point and it is a 
copy of k, elsewhere for each dose/response line. 


An example of the method is given in Table 9. 


V. DISCUSSION 


Apart from examples of the method of analysis of non-orthogonal 
slope ratio assays with arbitrary doses, Finney [1952] confines his 
discussion to balanced 2k or 2k + 1 designs with doses equally spaced 
and with z = 2. The 2k + 1 design is advocated since an additional 
validity test is provided at a price, however, of reducing the numbers 
of test objects being available at the remaining 2k points. In factorial 
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designs of the nk type there seems little need for the additional point, 
i.e., for the nk + 1 design, since the information supplied by it is some- 
what minor and irrelevant. The additional point gives no information 
about the points at issue in a factorial experiment: namely, are the 
various treatments causing slope differences? It merely gives a validity 
test, testing whether the mean regression line is linear to the response 
axis, and increases the precision of the mean slope. 

The coefficients drawn up in Table 8 are a selected group taken 
from a comprehensive tabulation of k, carried out using an automatic 
computor. The author is prepared to send extracts of this table on 
request. 

I am indebted to Professor C. W. Emmens for helpful criticism 
during the course of this study which was supported by grants from 
the New South Wales State Cancer Council. 
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137 NOTE: A Confidence Interval on the Abscissa 
of the Point of Intersection of Two 
Fitted Linear Regressions 


Marvin A. KasTENBAUM 
Oak Ridge National Laboratory 
Oak Ridge, Tennessee, U. S. A. 


Let Y, = a, + b,x and Y, = a, + bp be two fitted linear regressions 
based on n; and nz observations respectively, where the random variables 
Y,(j = 1, 2) upon which the observations are made, are normally and 
independently distributed with means E(Y;) = a; + 8,2, and common 
variance, o°. The abscissa of the point of intersection of these two lines 
is estimated by 

where (a2 — a,) and (b; — b,) are distributed jointly as a bivariate 
normal distribution with means (a, — a,) and (8, — 62), and 


1 2 
Cov — a,), (b, — = 4 (4) 


where S; = 
If we let X; = a2 — a,/8; — 8 , and if we estimate (2), (3), and 
(4), by replacing o* with s*, the pooled estimate of the error mean 
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square, then, according to a theorem by Fieller [1], the confidence 
limits for X, consist of those values for which 


— a,)° — 2X,[(a, — a,)(b, — — $8 


where ¢ is the appropriate level of the Student-distribution for 
mn, + n. — 4 degrees of freedom. 

The method outlined above is the simplest application of a generaliza- 
tion given by Box and Hunter [2]. 
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138 NOTE: On the Estimation of the Mean of a 
Poisson Distribution from a Sample 
with the Zero Class Missing 


J. O. 
Statistical Research Unit of the Medical Research Council 
London School of Hygiene and Tropical Medicine, London, England 


In.a sample from a Poisson distribution with the zero class missing, 
let the mean be @. 

The maximum likelihood estimate of the population mean X is 
given by the solution of 


d/(l —e”). (1) 


David and Johnson [1952] gave a table for \ given #, but said, “It 
does not seem possible to obtain an explicit expression for 4,” where i 
is the solution of the maximum likelihood equation. This opinion was 
endorsed by Finney and Varley [1955] who, however, pointed out that 
the equation could be solved rapidly by iterative and interpolatory 
procedures. 

Some years ago I discovered an explicit expression in the form of a 
Lagrange series and it may be worthwhile putting this on record. 

Writing a for the mean #, we have 


\=a—ae”* | (2) 
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and Lagrange’s expansion gives immediately 


or K=a- (3) 


Finney and Varley give the distribution of eggs laid in unopened 
flower heads of the black knapweed by the Knapweed gall-fly in two 
different years, 1935 and 1936. I take their distribution for 1935 as 
an example. The mean for 148 flower heads was 3.02. Accordingly 
\ must be calculated from (3) with a = 3.02. Table 1 shows the details 
of the calculation; we find ae~* = .1471769 and } = 2.8447. Finney 
and Varley give 2.845. 


TABLE 1 
CALCULATIONS FOR THE FINNEY-VARLEY DaTA 


(1) (2) (3) (4) (5) 
r (ae-*)* (2) X (3) |3.02 — (4) 
r! 

1 1 . 1471769 .14718 2.8728 

2 1 0216610 02166 2.8512 

3 1.5 0031880 00478 2.8464 

4 2.6 0004692 00125 2.8451 

5 5.2083 0000691 00036 2.8448 

6 9.41 .0000102 .00010 2.8447 


The series is convergent for a > 1; for, by Stirling’s Theorem, the 
general term of (3), 


r 
r! 


(ae~*)" ~ fae 


and <1 fora > 1. 
When a = 1, it is easy to show by expanding the exponential that the 
series 


=1+ sa} = 1 


r=1 r! 


and { = 0, as is otherwise clear. 

The series (3) may be awkward for 1 < a < 2, but should converge 
satisfactorily for a > 2. 

In 1926 A. G. McKendrick suggested a very simple method of 
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estimating the parameters of any discrete distribution with the zero 
class missing. For the Poisson, the procedure is as follows. Let x be 
the variate and N the total number of observations including the missing 
values. Calculate <x and }°x(x — 1) from the observations, i.e. the 
first two factorial moments (not the moment coefficients). The values 
of these are unaffected by the absence of the frequency for z = 0. Now 
E>-(x) = Nd where 2 is the population mean, and E{}>2(x — 1)} = 
Nd’. Thus an estimate of N is given by the ratio (D>z)?/o {x(x — 1)}. 

In Finney and Varley’s example }-z = 447, }ox(x — 1) = 1404, 
and (447)/1404 = 142.3. This would make the zero frequency negative, 
so we start by assuming it is zero. We take N, = 148 and 
i, = 447/148 = 3, nearly enough. This makes the zero frequency 
148e"* = 7.37, so we take N, = 155.37. This gives {, = 2.877 and 
the process is iterated as often as necessary. We find: 


Iteration N N 
1 148 3 
2 155.37 2.877 
3 156.75 2.852 
4 157.05 2.846 
5 157.12 2.845 
6 157.13 2.845 


A similar idea seems to be the basis of the comprehensive treatment 
by Hartley [1958] of the problem of maximum likelihood estimation 
from incomplete data. 


REFERENCES 


David, F. N. and Johnson, N. L. [1952]. The truncated Poisson. Biometrics 8, 
275-85. 

Finney, D. J. and Varley, G. C. [1955]. An example of the truncated Poisson 
distribution. Biometrics 11, 387-94. 

Hartley, H. O. [1958]. Maximum likelihood estimation from incomplete data. 
Biometrics 14, 174-94. 

McKendrick, A. G. [1926]. Applications of mathematics to medical problems. 
Proc. Edin. Math. Soc. 44, 1-34. 


139 QUERY: Differential Regression 


The growth of oat coleoptiles is affected by light, and this effect 
may be modified by the presence of indoleacetic acid and riboflavin. 
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A 2° factorial experiment with indoleacetic acid (J), riboflavin (R) 
and light (Z) was conducted in 4 randomized blocks. Each plot con- 
sisted of 7 coleoptiles; the variables measured being X, the initial length, 
and Y, the increase in length. The sums of 7 measurements, X and Y, 
are entered in Table 1, where the zero subscript indicates absence of 
the factor and the subscript 1 indicates the presence of a controlled 
amount of the factor. 


TABLE 1 
Oat CoLropTiLE GrowTs Data FOR THE 2? FacTorRIAL 


hk, | Toh Toke 


x y |x y |x 


1 91 148 | 119 156 | 124 162 | 122 159 
2 103 148 | 109 152 92 157 | 109 142 
3 93 160 91 163 97 161 89 147 
4 93 152 | 102 157 89 170 | 100 154 


Mean 95.00 152.00)105.25 157.00)100.50 162.50/105.00 150.50 


| 

Block |X y | x y | x 
1 | 98| 107 105| 94 115) 101 104 
2 | 9 107 103] 91 116] 100 101 
3 9%  124/ 97 ws} 110 97 108 
4 | 103 103 95 120] 100 115 
Mean |101.25 110.25/103.50 110.75] 97.50 112.75] 99.50 107.00 


X = Total initial length in mm. Y = Total increment of growth in mm. 


The failure of the normal analysis of covariance becomes apparent 
if separate analyses of covariance, assuming no block interactions, are 
run for L, and L,. The error lines are shown in Table 2. 
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TABLE 2 
Error Rows For SEPARATE ANALYSES OF COVARIANCE 


| df. | x? xy | y? 


| | 
Error (Zo) 9 777.56 | 97.25 231.00 27.355 
Error (Z;) 9 586 . 06 | —582.94 | 750.56 21.340 


From these the slope for the non-illuminated (Ly) data is b) = .1251 
and the slope for the illuminated data is b, = —.9947. The question 
then arises of how to test the effects of L, J, and R and their interactions. 


A simple solution is available by noting that the tests of 
ANSWER: the effects of each of these factors and their interactions 

are defined as single degrees of freedom. Let y;;, and 
Y,;, denote the adjusted and unadjusted means respectively, where 7 
indicates the level of light, j the level of indoleacetic acid and k the 
level of riboflavin. Then the single degree of freedom contrasts would 
be those in Table 3. 


TABLE 3 
SINGLE DEGREE OF FREEDOM CONTRASTS 
| 
Source Yooo | You Yoo | You | Yroo | Yior | Yio | Yiu 
Indoleacetic Acid (I) | — | —|+}]+]—j]—-—|+|{+ 
LXR titi 
TXR 4+] - 
LXIXR 


That is, the contrast 


Cir = —Yooo — Yoou. + Yoro + Your + Yroo + Yior — — 


will test the effect of the light by indoleacetic acid interaction. The 
variance of this contrast is easily found by replacing y;;. by Yi. — 
b,(X;;, — X4), where X;;, is the mean of the X’s for the ijkth treatment 
combination and X, is the X value to which all the means are adjusted. 
Then 
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Cur = —Yooo — You + Yoro + You + Yioo + Yio — Yuro — Yin 
+ bo(Xo00 + — Xoi0 — Xo) + O1(—Xi00 — + + 
The X’s are constants and the Y’s and b’s are all independent so 
Var = 4 Var 4 Var (Yi jx) 
+ (Xooo + Xoor — Xoro — Xo)” Var (bo) 
+ (—Xi0o — + Xiro + Var (6)). 


In this particular case I would not hesitate to pool the s}_,’s, so S}.. = 
24.35 with 16 degrees of freedom. Then 


-2 -2 
Var (Ci) = (5.25) (7.75) 
27.5625 60.0628 ie 
(2 + 777.56 * 586.06 ~ 52-9 
Cir = —12.30. 


Thus | Cz, | /WVar (C,,) = 1.70 is compared to a ¢t with 16 degrees of 
freedom at a determined significance level to test the effect of the 
light by indoleacetic acid interaction. The tests of the other six degrees 
of freedom may be determined in a similar manner. 

It should be noted that, with the exception of the test for the effect 
of light, the tests are independent of X,—the X to which the Y’s are 
adjusted. In testing light, the weighting of the b; will be 


(+ X F 4X4). 
ik 


A solution of this problem was given by Walter A. Hendricks in 
1935 (The Journal of Agricultural Science 25: 258-63). In that paper 
the term “differential regression’? was introduced and the Y’s were 
adjusted by the respective b’s and then subjected to an analysis of 


variance. The lack of independence of the adjusted Y’s was not taken 
into account. 


Joun T. WEBSTER 
North Carolina State College 
Raleigh, North Carolina 
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ABSTRACTS 


The following papers were presented at meetings of the Eastern North 

American Region of the Biometric Society and the Biometric Section of the 

American Statistical Association held in Chicago, Illinois, U. 8. A. from 
December 27-30, 1958. 


R. E. BARGMANN (Virginia Polytechnic Institute, Blacksburg, 
574 Virginia, U.S. A.). Some Interpretations in the Analysis of Uni- 
variate and Multivariate Transformed Data. 


1. The observation of significantly reduced variances in some assay 
experiments with higher animals has a simple statistical explanation 
based upon a result obtained by Karl Pearson. If there is heterogeneity 
of susceptibility within batches, the resulting variances are smaller. 
This result generalizes to covariances in the multinomial case. 

2. An inverse hyperbolic cosine transformation, which can be used 
to stabilize variances in non-central F, also produces a rapid approach 
to normality, after elimination of bias. Studies on this transformation 
are in progress and yield satisfactory results. This kind of approach 
may be useful for, testing equality of non-centrality in two or more 
parallel studies. 

3. A study is presented on patterns of dependence which reduce to 
independence under certain linear transformations. Acceptance of 
independence after a Hadermar transformation, where applicable, im- 
plies acceptance of the hypothesis of equality of variances and equality 
of all complete multiple correlations (‘“‘equipredictability’’) in the un- 
transformed set. Under the Helmert transformation, the test of equality 
of covariances, under the assumption that variances are equal, (not 
compound symmetry) reduces to a test of independence in the trans- 
formed set. The first of these tests has useful applications for the 
problem of reducing the number of variates, the second has been applied 
in studies of the “halo-effect’’ in psychology, and may be important for 
some genetic studies. 
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575 D. W. BEHNKEN (Princeton University, Princeton, New Jersey, 
U.S. A.). Some New Designs for Exploring Response Surfaces. 


Two new general classes of designs for exploring response surfaces 
are discussed. A complete description of these designs appears in two 
unpublished joint papers by Box and Behnken. 

The first class of designs is produced by taking the design matrix 
D, of any k- factor first order rotatable design with the minimum number 
of n = k + 1 rows and generating k — 1 additional matrices D, , s = 2, 
3, --: , k, by forming all possible sums of the row vectors of D, taken 
satatime. D,,D.,---,D,,--:,D,,°+*: , Dy will hence be (n/s) 
by k matrices and if these are considered as submatrices of an N by k 
design matrix D, (V = 2” — 2), second order rotatable designs can be 
found by multiplying each submatrix by a suitable constant a, . A 
standard formula for the set of constants a, is found for all k as well as 
specific solutions in which some of the a, values are zeros leading to 
designs involving fewer points. 

The second class of designs is one in which only three levels of each 
factor are required. Both rotatable and nearly-rotatable designs are 
used. These designs are generated by taking the transpose of the inci- 
dence matrix of a suitable Partially Balanced Design (PBIB) and re- 
placing the s non-zero elements of each row by one of the columns of 
a 2° factorial design matrix and each zero element by a 2° by 1 vector 
of zeros. The resulting matrix can be considered as a new design 
matrix. Thus if the original PBIB was for k treatments in b blocks 
of size s the derived design will be for k factors and will involve b X 2 
experiments. Fractions of the 2’ factorial may also be used when 
compatible. 

Conditions under which the designs are rotatable and nearly-ro- 
tatable are discussed and examples of both are given. 

Properties of both classes of design are set forth. 


G. E. P. BOX (Princeton University, Princeton, New Jersey, 


576 U.S. A.). Some Problems and Application in Non-Linear Esti- 
mation. 


An important problem facing chemists, physicists, and biologists 
arises when the experimenter believes that he can describe some phe- 
nomenon in terms of a function developed from theoretical consider- 
ations. The functional form is often not known explicitly but may be 
defined in terms of differential equations. Problems that arise in fitting 
such functions to data, testing fit, estimating constants and iterating 
towards a more truly descriptive functional form when the first postu- 
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lated form proves inadequate, are briefly discussed, using examples 
from the field of chemistry. In particular, procedures for checking 
on the existence of a true minimum sum of squares, of testing local 
linearity, of transforming parameters to ensure greater linearity and 
obtaining approximate confidence intervals are mentioned. 

The same methods can be used to derive transformations of the 
independent variables when fitting empirical functions. A set of orthog- 
onal vectors can be calculated for a given design which makes it simple 
to isolate separate degrees of freedom in the lack of fit sum of squares. 
Each single degree of freedom is associated with a transformation 
parameter for each of the independent variables. 


J. B. CHASSAN (Saint Elizabeth’s Hospital, Washington, D. C., 
577. U.S. A.). The Development of Clinical Statistical Systems for 
Psychiatry. 


Some of the basic properties of the data of clinical psychiatry and 
their implications for the application of statistics may be stated briefly 
as follows: (1) An underlying variability in the patient-state requires 
frequent repeated observations of each patient as part of the opera- 
tional definition of his psychopathology, together with the development 
of appropriate stochastic models, constructed so as to avoid errors 
in inference generated by questionable assumptions of independence, 
and also designed to recognize the subject-matter importance of sto- 
chastic dependence per se. (2) Unions rather than intersections of 
events cast in the framework of extensional, operational definitions of 
clinical concepts to achieve face-validity must form the basis of frequent 
patient-state obsetvations to replace over-elaborate and contrived one- 
shot momentary readings devoid of clinical content, in providing a more 
reasonable order of dimensionality for the repetition of observation. 
(3) The existence of subjective, a priori differences between investi- 
gators, such as that concerning the order of preference of patient- 
states must be recognized and dealt with. ‘The suggested definition by 
L. J. Savage of Statistics proper as “dealing with vagueness and with 
interpersonal differences in decision situations,” appears particularly 
pertinent in this context. 


W. S. CONNOR (National Bureau of Standards, Washington, 
578 D. C., U. 8. A.). Some Recent Work on Mixed Fractional 
Factorial Designs. 


Work is in progress at the National Bureau of Standards to produce 
a catalogue of fractional factorial experiment designs for experiments 
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in which m factors are studied at two levels and n factors at three levels, 
or briefly the 2”3" series. Designs now have been constructed for the 
cases 2‘3', 2°3', --- , 2°3'; 2°37, --- , 2°37; 2'3°. The treatment 
combinations have been chosen so that all main and two-factor inter- 
action effects can be estimated—i.e., they are not aliased with other 
main or two-factor interaction effects. In fact, in each design all, or 
almost all, of the estimates of these effects are uncorrelated. 

The method of construction consists of associating fractions of the 
2” factorial design with fractions of the 3” factorial in such a way that 
every treatment combination in each fraction of the 2” factorial is 
adjoined to every treatment combination in the corresponding fraction 
of the 3” factorial. 


H. A. DAVID and BEVERLY E. ARENS (Virginia Polytechnic 
579 Institute, Blacksburg, Virginia, U. S. A.). Optimal Spacing in 
Regression Analysis. 


Frequently the regression of a dependent variate y on an independent 
variable xz, at the control of the experimenter, is known to be roughly 
linear in the (finite) range of interest of x. For reasons of simplicity or 
economy it may be desired to estimate the true regression line from two 
observations only or from observations taken at two levels of z only. 
The question then arises how these two abscissae should be spaced so 
as to minimize, in some sense, the effects of possible non-linearity. Two 
alternative criteria are used to arrive at optimal spacings when the true 
regression is in fact quadratic, viz. minimization of the maximum ab- 
solute error or of the mean square error of the approximating straight 
line. In both cases the optimal abscissae turn out to be simple functions 
of the ratio | c |/o, where c is the coefficient of the quadratic term and 
o is the standard deviation of y. It is also shown that if a straight-line 
fit has been decided on, nothing is gained by taking observations at more 
than two levels of x. 


W. T. FEDERER (Cornell University, Ithaca, New York). 
580 Rectangular Lattice Designs for v = pk” in Incomplete Blocks 
of Size k. 


The construction and analysis of rectangular lattice designs for 
v = pk” in incomplete blocks of size k is given. Forn = 1,p = k + 1, 
these designs are the rectangular lattices given by B. Harshbarger; for 
p = k, a prime number, these designs are the prime-power lattice de- 
signs given by O. Kempthorne and W. T. Federer. Various numbers of 
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arrangements are available but the common one found involves three 
arrangements for r = 3q replicates. Experimental designs with partial 
and complete confounding of some treatment contrasts with incomplete 
blocks are being studied. 


WALTER DEAN FOSTER and ELWOOD KITSON WOLFE, 
JR. (Army Chemical Corps, Frederick, Maryland, U. §S. A.). 
Response Surface Techniques Versus Factorial Analysis in a 
Development Application. 


581 


A typical problem in engineering development where the criterion of 
evaluation is a biological response is presented as a basis for comparing 
the two types of analyses suggested in the title. 

The experimental situation is constrained by many factors. These 
are the requirements of procurement of a complex variety of materials, 
the advance scheduling of many independent development programs, 
the inherent lag in waiting for the biological response, and the need for 
detailed operating instructions for each particular test. 

In the face of the restraints imposed by the nature of the experi- 
mentation, sequential analyses are virtually impossible to apply without 
imposing a hopeless confusion upon the crew operating the test facilities. 
This paper is confined to the comparison of a 3 X 3 factorial design and 
its conventional analysis of variance with the same data analyzed by 
multiple regression techniques to approximate the response surface 
where the size (length) of experiment is determined in advance. Ad- 
vantages and disadvantages for each approach from the point of view 
of this particular experimental situation are presented and discussed. 
The sources of variation are isolated, identified, and cross-referenced in 
each of the two kinds of analyses. 

It is concluded that the net difference in information accorded by the 
two methods is small. However, the convenience of the factorial design 
to the engineer confers a distinct advantage to its continued use for 
most of the development problems. 


JAMES P. GEORGE (University of Tennessee, Knoxville, Ten- 
582 nessee, U.S. A.). Statistical Interpretation of Experiments in the 
Physiology of Reproduction in Dairy Cattle. 


(1) The fertilization rate of ova was very significantly improved in 
consequence of progesterone treatment. 

(2) The number of corpora formed was significantly reduced in 
consequence of progesterone treatment. 
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(3) The percentage of cows with estrus was very significantly in- 
creased in consequence of progesterone treatment. 

(4) The number of follicles produced was very significantly reduced 
in consequence Of progesterone treatment. 

(5) The number of follicles was very significantly affected (follicle 


stimulating hormones vs. lutenizing hormones) in consequence of differ- 
ences in hormone ratios. 


R. GNANADESIKAN (Procter and Gamble Company, Cincin- 
583 nati, Ohio, U. S. A.). The Application of Multivariate Analysis 
of Variance (Manova) to Chemical Process Problems. 


This paper deals with some applications of multivariate techniques, 
the theory of which has been available for some time now in the litera- 
ture. Specifically, the problems considered are regression and analysis 
of variance ones where the data is obtained from certain chemical proc- 
esses which yield more than one response on every unit which is studied 
under a formally designed experiment. The simultaneous analysis of 
such multi-response experiments, under the classical fixed Model or 
Model I, is illustrated using a 2° factorial design. Both tests of signifi- 
cance and confidence regions for parametric functions which are natural 
measures of departure from null hypotheses are given. One possible use 
of the confidence regions as a decision tool is also indicated. The pro- 
cedures dealt with here are, of course, valid for both quantitative and 
qualitative factors (as against responses) in the experiment. However, 
since factors are commonly quantitative in chemical process problems, 
an example is given of a multiple regression problem with three re- 
sponses, the theoretical results of which are, of course, essentially the 
same as those for ANOVA problems. A multivariate test for lack of 
fit of the assumed model is also illustrated in the example. In both 
problems alternative tests of significance are applied and the results 
compared, 


TAVIA GORDON (National Institutes of Health, Bethesda, 
584 Maryland, U.S. A.). Some Methodological Problems in Long- 
Term Studies of Cardiovascular Disease. 


In diseases which develop slowly, such as coronary and hypertensive 
heart disease, the approach which is often considered most desirable 
is long-term surveillance of closed populations. However, there are 
problems in the choice of populations for study and questions about 
the inferences that may be drawn from this survey approach. 

The ten years’ experience from the Framingham Heart Epidemiology 
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Study are reviewed in terms of biases in initial response, and biases in 
follow-up by clinical and other means. The probable effects of these on 
analytical results and the implications of these for longitudinal studies 
in general are discussed. The l'ramingham prospective experience is 
also examined for the inferences it permits about the validity of con- 
clusions drawn from retrospective studies. 


DAVID G. GOSSLEE (University of Connecticut, Storrs, Con- 
585 necticut, U.S. A.). Level of Significance and Power of the Un- 
weighted Means Test. 


The method of unweighted means is frequently used in the analysis 
of variance of data in which the subclass numbers are disproportionate. 
This paper is a summary of a study of the effect of the inequality of 
subclass numbers on the level of significance and power of the un- 
weighted means test. 

Exact and approximate expressions for the level of significance were 
obtained. The approximate level of significance was found to be a 
function of the coefficient of variation of the variances of the class means 
when the number of classes and the degrees of freedom for error are 
held constant. The results indicate that the test yields too many sig- 
nificant results, but that the disturbance to the 5 per cent level of 
significance is moderate. 

Approximate expressions for the power for a nominal level of s‘g- 
nificance of 5 per cent were developed and checked by empirical sam- 
pling experiments. It was found that the power increases as the harmonic 
mean increases when the number of classes, the degrees of freedom for 
error, and the size of the class effects are held constant. 

A more limited study of the level of significance and power for 
nominal levels of significance of 1 per cent and 10 per cent is also sum- 
marized in the paper. 


H. O. HARTLEY (Iowa State College, Ames, Iowa, U. S. A.). 


ond Some Problems in Linear and Non-Linear Programming. 


This paper consists of two parts. In the first part we consider certain 
non-standard problems in linear programming which arise in the mixing 
of feed-stuffs of known nutrient analysis to produce feed mixes with 
specified nutrient requirements at minimum cost. In the second part 
we develop a new method of non-linear programming: It is shown that 
a certain class of such problems can be approximated by standard 
tableau for linear programming and computationally solved by the 
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simplex method. This reduction is achieved by a polygonal approxi- 
mation of the non-linear functional relationships which are involved. 
The essential assumption made is that the objective function consists 
of additive components each depending on one activity level only. 


C. C. KRAUSE and H. SMITH, JR. (Procter and Gamble Co., 
587 Cincinnati, Ohio, U.S. A.). Some Uses of x’ in Market Research 
with Extensions to Multi-Dimensional Tables. 


The paper is concerned with various applications of the chi-square 
test for contingency tables with special emphasis on the market research 
sampling problem. 

Part I deals with two-way contingency tables. The hypotheses to 
be tested are those of independence and consistency of response. Ih one 
further case, the test for independence of response, given individual 
response consistency will be shown. 

Part II deals with the extension of the two-way problems to three- 
way problems. In certain sampling situations, this extension involves 
the non-parametric generalization of multivariate analysis. Relation- 
ships between these cases and corresponding correlation problems in 
the multivariate normal case will be shown. 

Throughout the paper, the multinomial distribution will be the basis 
of discussion. 


A. W. KIMBALL and R. F. KIMBALL (Oak Ridge National 
Laboratory, Oak Ridge, Tennessee, U.S.A.). Estimation from 
Observations Having Specified but Unassigned Expectations in 
a Radiation Experiment with Paramecium Aurelia. 


588 


This paper describes an unusual regression problem in which ob- 
servations may have one of several alternative expectations. The 
expectation for any observation is determined by the values of the 
independent variables associated with it. Since the levels of the inde- 
pendent variables are selected arbitrarily, the determination of the 
expectation is not a stochastic process. 

The problem arose in connection with an experiment designed to 
investigate the effect of postirradiation streptomycin treatment on the 
mutagenic property of X-rays in Paramecium aurelia. A hypothesis 
is advanced to explain the effect, and a simple model suggested by the 
hypothesis is fitted to the data. Since standard regression theory is 
inapplicable, an iterative fitting procedure based on a modification of 
conventional methods is employed. 
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H. E. MCKEAN and V. L. ANDERSON (Purdue University, 
589 Lafayette, Indiana, U. S. A.). Chromosome Analysis in Bio- 
metrical Genetics. 


The practical difficulty encountered in direct attempts to estimate 
the components of genetic variance from the model given by Kemp- 
thorne (Genetics 40:153-67) for random mating diploid populations can 
be partially obviated by formulating an artificial “chromosome” popu- 
lation whose basic segregating hereditary unit is the chromosome. The 
procedure involved is the selection of two chromosomes of each type at 
random from the random mating population and the formulation of all 
possible genotypes from the sampled chromosomes. The chromosomes 
sampled may be preserved by use of crossover inhibitors, so that the 
chromosomes are now observable genetic units. 

The problem actually examined here is the special case of no multiple 
alleles and gene frequence 1/2. It is found that the average value of 
the “additive” chromosome variance is one-half the additive genetic vari- 
ance of the original random mating population whether or not domi- 
nance is present, but assuming no epistasis. However, the estimation 
of the dominance variation is not so direct, since the expectation of the 
chromosome dominance variance contains cross product terms of domi- 
nance deviations for linked loci, which cannot be ignored except in 
special circumstances. On the other hand, replication of the study by 
considering several independent chromosome populations allows direct 
estimation of the dominance variance of the original population, and 
also allows for inferences about the nature of the dominance deviations 
in the organism. Further results have been obtained assuming dual 
epistasis, and estimation is still possible with moderate modifications. 


IWAO M. MORIYAMA (National Office of Vital Statistics, 
590 Washington, D. C., U.S. A.). Problems of Diagnosis and Classi- 
fication in Cardiovascular Renal Mortality Statistics. 


The cardiovascular renal diseases have, for years, constituted the 
principal causes of death in the United States. The apparent mortality 
trend and differentials have been questioned. Much of the difference 
has been attributed to changes in medical concepts, problems of diag- 
noses, inadequate reporting of causes of death, and difficulties in identi- 
fying consistently the specific components of cardiovascular renal 
diseases in a single-cause disease classification. Each of these factors 
is examined and an attempt is made to assess their importance in the 
interpretation of mortality statistics relating to cardiovascular renal 
diseases. 
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JAMES NORTON (Purdue University, Lafayette, Indiana, 
591 U.S. A.). Influence of Weighting Choices on Tests of Main 
Effects and Interactions. 


With respect to fixed model multiple classifications, the point of 
view is presented that the research worker has the responsibility, and 
should be given the opportunity, to specify the composition of the 
“treatment populations’ to which he wishes to generalize, and to 
decide whether or not he wishes the estimation and test of the effect 
of any particular treatment variable to be independent of the effects 
of the other treatment variables in the design. Formulas are presented 
for testing main effects with arbitrary composition of treatment popu- 
lations, and various special cases are considered and related to previous 
discussions of this topic. It is not possible to set down a single set of 
restrictions on the parameters of the linear model which permit the 
freedom referred to above and at the same time conform to what the 
research worker usually wishes the hypothesis of zero interaction to 
mean. Thus the test for interaction must be separate from, and is 
unaffected by, the weighting choices made in the tests of the main 
effects. 

With respect to mixed model double classifications, some experi- 
mental sampling results are reported which relate to the robustness- 
and limitations-of the usual paired-differences ¢-test in the r X 2 case, 
under various distributions of the variances of the differences. 


R. D. Remington (University of Michigan, Ann Arbor, Michigan, 


oe U.S. A.). Measurement of Blood Pressure Reactivity in Children. 


A series of blood pressure stimuli including initial blood pressure 
measurement, cold, breath holding, exercise, CO, inhalation, and 
postural change is studied. The literature concerning blood pressure 
reactivity is reviewed. 

A standard series of stimuli is applied to four medical students. 
An experiment consisting of six randomly selected Latin Squares is 
designed to test variation between subjects, between tests, between 
various orders of administration of the tests, and day to day variation. 
Only the between-tests difference is consistently significant. 

A similar test series is applied to 2 groups of children aged 8 to 18. 
The groups consist of 50 children having a hypertensive parent and 
50 school children matched by age and sex. Thirty-two determinations 
of blood pressure and 31 determinations of pulse rate are made on each 
subject. 

The children of hypertensive parents show uniformly and signifi- 
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cantly higher mean systolic blood pressure levels and uniformly and 
significantly lower mean pulse rates. Mean diastolic blood pressures 
in the two groups are very similar. 

Linear discriminant function analysis of the difference between 
stimulated and pre-stimulated readings for each stimulus reveals no 
difference between the two groups except that standing after a period of 
supine rest causes a greater rise in systolic blood pressure for the children 
of hypertensive parents. 


H. C. SWEENY (Atlantic Refining Company, Philadelphia, 
593 Pennsylvania, U.S. A.). The Power of the Analysis of Variance 
Test Under Constrained Randomization. 


Youden has pointed out that some of the randomizations which 
could arise in assigning treatments to plots are undesirable for individual 
trials, and strict adherence to the randomization procedure (that every 
randomization has an equal chance of being selected) may lead to de- 
signs which are unacceptable to the experimenter. Youden showed that 
it is possible to pick sub-sets of the complete randomization set which, 
under randomization analysis, would give correct expected values of 
mean squares in the analysis of variance. 

Sampling experiments were run to ascertain whether any loss in 
power could be noted for these sub-sets and to examine the degree of 
robustness of the procedure against linear trends of plot effects. In 
general, no loss of power could be found over the range examined. 


G. P. WILLIAMS and E. K. HARRIS (Dept. of Health, Edu- 
594 cation, and Welfare, Washington, D. C., U.S. A.). The Effect of 
Serial Correlation on Principal Components. 


The method of principal components has in the past been applied 
to sets of variables measured on independent units, thus eliminating 
the problem of auto-correlated variables. However, in many important 
fields of study, e.g., meteorology and air pollution, economics, environ- 
mental measurements such as stream discharges, etc., cross-correlated 
variables frequently appear as auto-correlated time series. 

The system of auto-and cross-correlations among n variables may be 
expressed in the form of'a symmetric matrix, consisting of circulant 
submatrices, each of which contains the lagged correlations between a 
given pair of variables. The eigenvalues, eigenvectors, and inverses of 
such submatrices are known. 

An analytical solution for the eigenvalues of this matrix has not yet 
been obtained. However, to illustrate the problem, four meteorological 
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variables (temperature, relative humidity, barometric pressure, and 
wind speed) have been used. Each series consists of daily readings over 
a five-year period. Computations have been carried out on an IBM-704. 


G. STANLEY WOODSON (Commission on Professional and 

505 Hospital Activities. Inc., Ann Arbor, Michigan, U. 8. A.). Con- 
tributions of Multi-Hospital Data to Standardization and Reli- 
ability of Laboratory Determinations. 


The Commission on Professional and Hospital Activities, Inc. is a 
non-profit organization, sponsored by various medical and hospital 
associations for the purpose of accomplishing research on the improve- 
ment of the quality of patient care furnished by physicians and hospitals. 

As one of its adjuncts the Commission operates the Professional 
Activity Study, a system designed to furnish the medical staffs of 
hospitals a rapid feed-back of new and useful information on patient 
care and treatment. The application of modern data-processing ma- 
chines and methods make possible a permanent, continuous study that 
is sensitive and responsive to current medical problems and practices. 

Included in the data collected on patients are certain laboratory 
determinations. Because of the nature of the collection system and the 
rapid feed-back of information to the hospitals in the PAS, it has been 
possible to establish a program which will not only assist local labora- 
tories in establishing their standards, but will contribute directly to 
maintenance of the reliability of such determinations without the con- 
tinual use of standardized solutions. 

This paper presents the development of this program, and some of 
the more interesting results that have emerged from its utilization. 
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THE BIOMETRIC SOCIETY 


Membership 


The paid-up membership at the end of 1958 was 1454, an increase of 
3 per cent over the previous year. Regional figures were: 


Australasian 61 India 31 
Belgian & Belgian Congo 63 Italian 70 
Brazilian 38 Japan 46 
British 198 Netherlands 43 
Denmark 16 Sweden 13 
ENAR 510 Switzerland 27 
French 78 WNAR 103 
German 103 At Large 54 


The new Council members for the term 1959-1961 are: M.S. Bartlett, 
D. G. Chapman, C. W. Emmens, C. G. Fraga, Jr., A. Lenger, C. C. Li, 
and G. 8. Watson. 
British 

The Region met with the biological methods group of the Society 
for Analytical Chemistry on February 3, 1959, when the following papers 
were read: 

J. V. Smart and G. A. Stewart—Factors affecting x’ (slope) in insulin 
assay using the mouse convulsion method. 

V. J. Birkinshaw—The use of a range method in estimating variance 
in biological assays. : 

K. L. Smith—Comparison of the approximate with more refined 
methods for treating 2 + 2 quantal response assays. 


ENAR 


The eleventh annual meeting in Chicago, December 27-28, 1958, 
was held jointly with the American Statistical Association and the 
Institute of Mathematical Statistics. Some of the subjects and speakers 
were: 


Cardiac Problems—Tavia Gordon, R. Remington, Iwao Moriyama. 
Radioactivity—A. M. Dutton and S. L. Crump, A. W. Kimball. 
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Computers for Linear Programming and Response Surface Prob- 
lems—H. O. Hartley and G. E. P. Box. 

Response Surfaces—D. W. Behnken, E. K. Wolfe. 

Multivariate Analysis—H. Smith, E. K. Harris and G. P. Williams, 
R. Gnanadesikan, R. E. Bargmann. 

Unequal Subclass Numbers—J. Norton, D. Gosslee. 

Smoking and Lung Cancer—H. F. Dorn. 


France 


The French Region celebrated its tenth anniversary with a special 
meeting on January 28, 1959 at which M. Jean Mothes gave an address 
entitled, ‘Des chiffres et de leur bon usage.”’ 

At a joint meeting with the Société de Génétique held on April 8, 
1959, the programme was as follows: 


G. Teissier—Application des fonctions discriminantes 4 un probléme 
de biométrie génétique. 

Y. Demarly—Quelques aspects de |’évolution génétique des plantes 
polyploides. 

Ph. Merat—La théorie des jonctions dans |’étude de l’inbreeding 
pour les génes liés au sexe. 


Netherlands 


The number of members increased slightly during 1958 to 438, and 
close cooperation was maintained with the medical-biological section 
of the Statistical Society and the section for statistical techniques of the 
Netherlands Agricultural Society. The following papers by members 
have been given at meetings of these societies: 


Kruskal and Wallis’s H-test (Verbeck); Medical research without 
statistics (deJongh); Confidence limits in laboratory experiments 
(Fortuin and Hamaker) ; Application of a-selective methods (Riimke); 
Non-parametric test for bias (vanElden); Crop planning (Louwes); 
Determining optimum conditions (Leppink); Production curve of 
standard dairy cattle (Hamming); Inheritance of milk production 
(Politick); Path-coefficients (Bosma); Selection based on yield 
(Sieben); Precision of soil analyses (Vermeulen). 


In a joint session with the Netherlands Society for Physiology and 
Pharmacology, papers on the estimation of the ED50 were given by van 
Strik, van Proosdy, Hartzema, Hertzberger, Bezem, and Riimke. 
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ANNUAL FINANCIAL STATEMENTS 


SECRETARY’sS ImpREST ACCOUNT 


Statement of Income and Expenditure during year ending Dec. 31, 1958 


Income 


Balance in hand, 1 Jan., 1958 

1957 Directory (3 @ 7/-) 

Ottawa Conference fees (23 @ £3, 1 @ £2.18.0d.) 
Ottawa expenses—repayment 

British Region dues (1 @ £1.10.0d.) 


Expenditure 
Office equipment and stationery 
Secretarial assistance _ 
Printing—1957 Directory 
other 
Postage 
Ottawa expenses 
Bank expenses 
Payment to British Region 


Balance in hand—31 Dec., 1958 


£ 8. d. 
509 17 10 
1 1 0 
71 18 0 
231 0 2 
1 10 0 
815 7 0 
6 12 1 
75 0 0 
326 3 7 
22 9 1l 
87 13 2 
231 0 2 
7 9 

1 10 0 
750 16 8 
64 10 4 


I certify the above to be a true record of my transactions on behalf of the Bio- 


metric Society. 


M. J. R. Healy, (Signed) 
Secretary. 


I have examined the account book and vouchers produced by the Secretary and 


certify that the above statement is in accordance therewith. 


E. Church, A.A.C.C.A. 
(Signed ) 


SecrETARY’s OrrFice, Bupaer 1958 


Office equipment and stationery! 
Secretarial assistance 

Printing 

Postage 


Travel? 


1This includes £85 for a dictating machine. 
2This is to cover attendance at the German Region meeting in January. 
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THE BIOMETRIC SOCIETY 


TREASURER’S Report 1958 


BALANCE SHEET 


Assets 


Cash: Bank Balance 
Petty Cash 


Liabilities 


Subscriptions, 1959 
Dues, 1959 


Surplus, January 1, 1958 
Gain for Period 


Audited: 
Date: 


$5,353.55 
18.90 $5,372.45 


86.50 
52.25 138.75 


3,107.45 
2,126.25 «5,233.70 


$5,372.45 


Herbert M. Beckler (Signed) 
March 24, 1959 


INCOME AND EXPENDITURE STATEMENT 


Income 


Subscriptions—1957 $ 149.50 
Subscriptions—1958 4,503.50 $4,653.00 
Dues —1957 $ 64.50 
Dues —1958 2,067.50 2,132.00 
Sustaining memberships—1958 520.00 
Back dues and subscriptions 55.50 
Regional allotments $ 92.25 
BIOMETRICS allotments from sustaining members 200.00 
Back issues 117.00 
Member subscriptions to Journal of ASA 110.00 
Overpayments 3.55 
Payments for directories 11.00 
UNESCO coupons collected 6.85 
Refund from Post Office—Knoxville 24.75 
Refund from Post Office—New Haven 7.08 
$ 572.48 


Less credits and allotments used 


Total Income 


127.70 444.78 


$7,805.28 
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Expenditures 
Postage 
Office supplies and stationery 
BIOMETRICS 
Member subscriptions to Journal of ASA 
Post Office box rental 
Secretary’s Office: Operating expenses 
Desk set for first Secretary of Society 


Total Expenditures 
Excess of Income over Expenditures 
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5,679.03 
2,126.25 


$7,805.28 


Audited: Herbert M. Beckler (Signed) 
Date: March 24, 1959 


OPERATIONS STATEMENT OF BrioMETRICS VOLUME 14 [1958] 


Income (1 February 1959) 
Biometric Society 
603 
844 
8 sustaining 
675 A.S.A. 
9693 Direct 
Sale of back issues 
Biometric Society 
Editor’s Office 
Sale of Reprints 
Exchange 
1 E.N.A.R. Membership 


Expenditures (1 February 1959) 
Cost of Journals 
Printing 
Issue No. 
Issue No. 
Issue No. 
Issue No. 


Mailing and Express Charges 
Issue No. 1 
Issue No. 2 
Issue No. 3 
Issue No. 4 


$ 4.00 $ 2,412.00 


2.75 


$2,683.64 
2,898.50 
2,617.50 
2,779.67 


182.85 
157.68 
174.16 
191.37 


2,321.00 
200.00 


165.22 
4,699.39 


10,979.31 


706.06 


$ 4,933.00 
2,700.00 
6,786.50 


4,864.61 
1,457.81 
75.69 
3.00 


$20,820.61 


11,685.37 


$ 84.35 
70.85 
5,128.51 
135.00 
9.00 
201.32 
50.00 
50 
25.00 
4.00 
7.00 
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Cost of reprints 
Issue No. 4 (1957) 
Issue No. 1 

Issue No. 2 

Issue No. 3 

Issue No. 4 


Mailing Charges Reprints 
Issue No. 4 (1957) 

Issue No. 1 

Issue No. 2 

Issue No. 3 

Issue No. 4 


Reprinting March, 1950 


March, 1949 


Operating Expenses 
Biometric Society Credits 
Editor’s Office 


Stamps 

Stationery, mailing envelopes 

Duplicating 

Salary and F.I.C.A. 

Office Supplies 

Telephone calls 

Addressographing 

Express charges 

Refunds on subscriptions and 
overpayments 

Travel 

P. O. Deposit 


Income 
Expenditure 


Surplus 
Non-operating Items 
Income 


Interest on Bond 
Bank Interest 
Insurance policy refunds 


Expenditures 


Safe Custody of Bond 


Credits to 1957 Accounts Receivable 


Surplus 
Gross Surplus 


220.68 
308.11 
363.29 
302.57 


18.29 
18.44 
25.81 
21.90 


631.46 
277.71 


$ 438.71 
474.33 
3.75 
1,349.41 
32.35 


71.85 
324.96 
110.78 


$ 8.42 
162.00 


1,194.65 


84.44 


$ 34.75 


2,559.36 


$ 507.59 


170.42 


337.17 
$ 4,690.04 
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1,279.09 


909.17 


$ 2,594.11 
16,467.74 
20,820.61 
16,467.74 


$ 4,352.87 
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Balance Sheet Biometrics Volume 14 
Assets (1 February 1959) 


Accounts Receivable $ 1,294.53 
Bank Balances 

Savings No. 1 $9,795.93 

Savings No. 2 7,100.00 

Checking 8,457.51 25,353.44 

Liabilities 

Subscriptions to Vol. 15 $ 5,386.75 
Subscriptions to Vol. 16 155.75 
Subscriptions to Vol. 17 35.00 
Subscriptions to Vol. 18 7.00 
Balance from previous volumes 16,373.43 
Surplus from Volume 14 4,690.04 


$26,647.97 $26,647.97 


Note: 


Not included in assets is stock of back issues from Volumes 1-14. 
Not included in expenses is printing bill for December reprints, estimated at 
$300.00 but not received as of 1 February 1959. 


Audited March 4, 1959 
R. Bruce Prouty (Signed) 


CHANGES IN MEMBERSHIP 


(January 15—May 1, 1959) 
Changes in Address 


Dr. Donald W. Bailey, Laboratory Aids Branch, DRS, National Insti- 
tutes of Health, Bethesda 14, Maryland, U.S.A. 

Dr. A. H. Carter, 112 Peachgrove Road, Hamilton, New Zealand 

Mr. David B. Christian, 60189 Bremen Hwy., Mishawaka, Indiana, 
U.S.A. 

Dr. Bliss H. Crandall, P.O. Box 86, Provo, Utah, U.S.A. 

Ing. Agr. Luis E. Ramirez Davila, Department of Statistics, North 
Carolina State College, P. O. Box 5457, Raleigh, North Carolina, 
US.A. 

Mr. Wilford L. Davis, 1801 Parkline Drive, Apt. 14, Pittsburgh 27, 
Pennsylvania, U.S.A. 

Mr. J. B. Douglas, School of Mathematics, University of N.S.W., Box 1, 
Kensington, New South Wales, Australia 

Mr. H. M. Finucan, Mathematics Department, University of Queens- 
land, St. Lucia, Brisbane, Australia 

Mr. Robert Fitzpatrick, 4423 54th Avenue, SW, Seattle 16, Washington, 
US.A. 
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Mr. Alfred E. Garratt, 1524 Lincoln Avenue, Alamogordo, New Mexico, 
U.S.A. 

Dr. Hans Gebelein, Bleibiskopfstr. 62, Oberusel/Ts., Germany 

Dr. Mordecai H. Gordon, 4527 Chestnut Street, Bethesda 14, Maryland, 
U.S.A. 

Mr. Max Halperin, Knolls Atomic Power Laboratory, General Electric 
Company, Schenectady, New York, U.S.A. 

Mr. Gordon, Haskell, Head, Genetics Department, Scottish Horti- 
cultural Research Institute, Mylnefield, Invergowrie by Dundee, 
Angus, Scotland 

Prof. Paul G. Homeyer, General Analysis Corporation, 11753 Wilshire 
Boulevard, Los Angeles 25, California, U.S.A. 

Dipl. Math. Gunther Hox, Hoffkugerweg 22, Hamburg, Germany 

Dr. 8. Karatas, 520 Buffalo Street, Ithaca, New York, U.S.A. 

Mr. James Kerr, Engineering Research, 400 McKnight Road, St. Paul 6, 
Minnesota, U.S.A. 

Mr. Robert Lichter, bei Ladenburg/Neckar, Rogenhof, Germany 

Dr. Arthur 8. Littell, School of Medicine, Western Reserve University, 
Cleveland 6, Ohio, U.S.A. 

Dr. Ole V. Maalge, Univ. Institut of Microbiology, Oster Farimagagade 
2A, Copenhagen K, Denmark 

Mr. Nicholas E. Manos, 9847 Singleton Drive, Washington, D. C., 
U.S.A. 

Dr. Margaret Pearl Martin, Department of Biostatistics, Johns Hopkins 
University, Baltimore 5, Maryland, U.S.A. 

Mr. G. McLoughlin, 21 Timber Lane, Cochituate, Massachusetts, 
US.A. 

Mr. N. P. Natu, Statistician, 145, Railway Lines, Sholapur, India 

Mr. W. V. Neisius, 701 Wilmot Road, Scarsdale, New York, U.S.A. 

Dr. Jerry 8. Olson, 110 West Farragut Road, Oak Ridge, Tennessee, 
U.S.A. 

Mr. Joe Powell, Department of Biostatistics, Hutchinson Memorial 
Building, 1430 Tulane Avenue, New Orleans, Louisiana, U.S.A. 

Dr. K. V. Ramachandran, Demographic Center, Chembur, Bombay 38, 
India 

Dr. Philburn Ratoosh, Department of Psychology, University of 
California, Berkeley 4, California, U.S.A. 

Mr. Avo Raud, Granitvagan 8A, Uppsala, Sweden 

Dr. N. Rintelen, Waltharistr. 16, Berlin-Wannsee, Germany 

Mrs. Allegra H. Rodgers, Bristol-Myers Products Division, 225 Long 
Avenue, Hillside, New Jersey, U.S.A. 

Dr. Jane A. Russell, Department of Biochemistry, Emory University, 
Atlanta 22, Georgia, U.S.A. 
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Miss Marion M. Sandomire, U. S. Department of Agriculture, 800 
Buchanan Street, Albany 10, California, U.S.A. 

Cand. Math. Berthold Schneider, Werderstr. 96, Karlsruhe-Baden, 
Germany 

Dr. Morton D. Schweitzer, 134 West 92nd Street, New York 25, N. Y., 
US.A. 

Dr. Edward B. Seligmann, Jr., Division of Biologic Standards, National 
Institutes of Health, Bethesda 14, Maryland, U.S.A. 

Dr. Charles E. Shelby, ARS-USDA, 312 New Custom House, Denver 2, 
Colorado, U.S.A. 

Dr. G. F. Sprague, Corn and Sorghum Section, Plant Industry Station, 
Beltsville, Maryland, U.S.A. 

Mr. Herbert Stander, Schering Corporation, Bloomfield, New Jersey, 
U.S.A. 

Dr. Harold E. Young, School of ata University of Maine, Orono, 
Maine, U.S.A. 

Dr. Karl F. Zimmerman, Chaussoestr. 25, Berlin N. 4, Germany 


New Members 
At Large 


Mr. Alfonso Velarde Beristain, Avenida Trujillo 749, Colonia Linda 
Vista, Mexico, D. F. 

Dr. Barend de Loot, 109 Lynnwood Road, Pretoria, South Africa 

Dr. William H. Hathaway, Rockefeller Foundation, Apartado Aekeo 
58-13, Bogota, Colombia, South America 

Mr. Judson V. McGuire, Jr., Apartado 654, Camaguez, Cuba 

Dr. Fernando Orozco-Pinan, Ingeniero Agronomo, Explotacion Agricola, 
“El Encin”, Alcala de Henares, Madrid, Spain 

Dr. Mary Hanania Regier, American University of Beirut, Beirut, 
Lebanon 

Dr. Robert H. Riffenburgh, Department of Mathematics, University 
of Hawaii, Honolulu 14, Hawaii 

Mr. Roberto Sasso, Apartado 186, San Jose, Costa Rica 

Dr. D. E. W. Schumann, Department of Statistics, University of 
Stellenbosch, Stellenbosch, South Africa 

Mr. Dennis Whang, P. O. Box 434, Wahiawa, Oahu, Hawaii 

Mr. Amador D. Yniguez, College of Agriculture, University of the 
Philippines, Laguna, Philippines 


Australasian Region 


Mr. D. J. Daley, 30 Mt. Ida Avenue, Hawthorn East E3, Melbourne, 
Victoria, Australia 
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Mr. Robert W. Rutledge, 42 Clontarf Street, Seaforth, N.S.W., 
Australia 


British Region 
Dr. N. R. Draper, 3 Malvern Terrace, Winchester Road, Southampton, 
England 


Mr. Peter Maurice Payne, 227 Croydon Road, Wallington, Surrey, 
England 


French Region 


Mr. Jean Dejardin, 3, Allee La Bruyere, Clichy-Sous-Bois (S-et-O), 
France 

Mr. Robert Flamant, Unite de Recherches Statistiques de |’Institut 
Gustave-Roussy 16 bis, Avenue Raul-Vaillant Couturier, Villejuif 
(Seine) France 

Mr. Fernand Nicolas, 12, Cite Beau-Site, Flace Les Macon (Saone-et- 
Loire) France 

Madame Evelyne Orssaud, 5 Ave. Leon Gambetta, Montrouge (Seine) 
France 

Mr. Jean Pagot, Centre de Recherches Zootechniques, B.P. 262, Bamako 
Soudan A.O.F., France 

Mademoiselle Claude Rouquette, 99 Rue de la Pompe, Paris 16e, France 


German Region 


Dr. Heinz Feltz, Max-Planck-Institut fur Zuchtungsforschung, (17A) 
Rosenhaf bei Heidelberg, Heidelberg, Germany 

Dr. Dietrich Fewson, Emil-Wolff-Str. 34, Hohenheim, Stuttgart, 
Germany 


India 


Dr. G. R. Seth, Indian Council of Agricultural Research, Library 
Avenue, Pusa, New Delhi, India 


Italian Region 


Dr. Luigi De Carli, Instituto di Genetica, Via Sant-Epifanio 14, Pavia, 
Italy 

Dr. Bruno De Finetti, Via Poggio Catino 7, Rome, Italy 

Dott. Francesco Mattei, Via Tadolini 13, Rome, Italy 

Prof. Gaetano Pietra, c/o Instituto di Statistica, Universita di Padova, 
Padova, Italy 

Prof. Dr. Albino Ugge, viale Monza n. 16, Milano (526), Italy 
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Japan 
Mr. Shoji Ura, Keio University, 794 Koganei-shi, Tokyo, Japan 
Netherlands 


Dr. G. J. Fortuin, Philips-Gezondheidscentrum, Willemstraat, Eind- 
hoven, Netherlands 


Sweden 
Fil. kand. Adam. A. 8. Taube, Thunbergsvagen 19, Uppsala, Sweden 
Eastern North American Region 


Mrs. Joann H. Ammerman, 207-12 Airport Road, West Lafayette, 
Indiana, U.S.A. 

Prof. Theodore W. Anderson, Jr., Department of Mathematical Sta- 
tistics, Columbia University, New York 27, New York, U.S.A. 

Mr. Philip G. Archer, Johns Hopkins School of Hygiene and Public 
Health, 615 N. Wolfe Street, Baltimore 5, Maryland, U.S.A. 

Mr. Glenn F. Atkinson, Department of Plant Breeding, Cornell Uni- 
versity, Ithaca, New York, U.S.A. 

Dr. Glenn E. Bartsch, School of Hygiene and Public Health, 615 N. 
Wolfe Street, Baltimore 5, Maryland, U.S.A. 

Mr. Victor C. Beal, Jr., Dairy Department, Purdue University, West 
Lafayette, Indiana, U.S.A. 

Dr. A. W. Bendig, Department of Psychology, University of Pittsburgh, 
Pittsburgh 13, Pennsylvania, U.S.A. 

Dr. Austin W. Berkeley, Department of Psychology, Boston University, 
308 Bay State Road, Boston 15, Massachusetts, U.S.A. 

Dr. J. N. Berrettoni, 19242 Scottsdale Boulevard, Shaker Hts. 22, 
Ohio, U.S.A. 

Dr. Mark Henry Bert, 343 Winter Street, Waltham 54, Massachusetts, 
US.A. 

Mr. Bernard H. Bissinger, Mathematics Department, Lebanon Valley 
College, Annville, Pennsylvania, U.S.A. 

Mr. Neeti R. Bohidar, Department of Statistics, lowa State College, 
Ames, Iowa, U.S.A. 

Dr. Bertrand Benj. Bohren, Department of Poultry Science, Purdue 
University, Lafayette, Indiana, U.S.A. 

Mr. Elwood L. Bombara, 2011 Lance Road, Huntsville, Alabama, U.S.A. 

Mr. Thomas Howell Bulpitt, 431 P.T. and E., Nela Park, East Cleveland 
12, Ohio 

Mr. Bernard Carol, 1478 Delores Place, Seaford, L. I., New York, U.S.A. 

Dr. Richard Leston Carter, Department of Industrial Engineering, 
Illinois Institute of Technology, Chicago 16, Illinois, U.S.A. 
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Mr. Louis J. Celi, Columbian Carbon Company, 380 Madison Avenue, 
New York 17, N. Y., U.S.A. 

Mr. William P. Chu, 7901 Loretto Avenue, Philadelphia 11, Penn- 
sylvania, U.S.A. 

Miss Pamela M. Clarke, Statistical Laboratory, Department of Agri- 
culture, Ottawa, Ontario, Canada 

Dr. Willard H. Clatworthy, 380 McClellan Drive, Pittsburgh 36, Penn- 
sylvania, U.S.A. 

Mr. P. P. F. Clay, Division of Applied Biology, National Research 
Council, Sussex Drive, Ottawa, Canada 

Miss Irma Coons, 3-F Materials Eng. Depts., Westinghouse Electric 
Corporation, E. Pittsburgh, Pennsylvania, U.S.A. 

Mr. Arthur 8. Covert, 5313 St. James Terrace, Pittsburgh 32, Penn- 
sylvania, U.S.A. 

Mr. David I. Cox, 31 Curtiss Hall, lowa State College, Ames, Iowa, 
US.A. 

Dr. John Stapley de Cani, 114 West Mount Airy Avenue, Philadelphia 
19, Pennsylvania, U.S.A. 

Mr. Ronald 8S. Dick, 260-75 73 Avenue, Glen Oaks, New York, U.S.A. 

Mr Roger F. Diffenderfer, 332 Buena Vista Road, Bridgeport 4, Con- 
necticut, U.S.A. 

Mr. David Durand, 52-480, Massachusetts Institute of Technology, 
Cambridge 39, Massachusetts, U.S.A. 

Mr. William A. Ericson, 4 Hancock Pk., Cambridge 38, Massachusetts, 
US.A. 

Mr. William B. Fetters, 116 Irvington Street, SW, Washington 24, 
D. C., U.S.A. 

Mrs. Elsie D. Foard, U. S. Dept. of Agriculture, ARS, HNRD, 
Washington 25, D. C., U.S.A. 

Dr. D. A. 8. Fraser, Mathematics Department, University of Toronto, 
Toronto, Canada 

Mr. David Frazier, 4440 Warrensville Center Road, Cleveland 28, 
Ohio, U.S.A. 

Mr. John Frieson, Statistical Laboratory, Science Service, Ottawa, 
Ontario, Canada 

Mr. Granville R. Gargiulo, 35-30 81st Street, Jackson Heights 72, 
New York, U.S.A. 

Dr. Stanley N. Gaunt, Stockbridge Hall, University of Massachusetts, 
Amherst, Massachusetts, U.S.A. 

Dr. Marvin Glasser, Columbia University School of Public Health, 
21 Audubon Avenue, New York, N. Y., U.S.A. 

Mr. William A. Glenn, Box 454, Blacksburg, Virginia, U.S.A. 

Dr. R. Gnanedesikan, 3619 Reading Road, Cincinnati 29, Ohio, U.S.A. 
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Mr. Albert L. Goldman, Smithsonian Institution, 1113 Dupont Circle 
Building, Washington 6, D. C., U.S.A. 

Mr. Leon L. Gordon, Ethicon, Inc., Somerville, New Jersey, U.S.A. 

Dr. David G. Gosslee, Agricultural Experiment Station, Storrs, Con- 
necticut, U.S.A. 

Mr. Clifton W. Gray, Tompkins Hall 123, North Carolina State College, 
Raleigh, North Carolina, U.S.A. 

Miss Mary T. Greene, Botany Department, U.V.M., Burlington, 
Vermont, U.S.A. 

Mr. Herschel N. Hadley, 45 Martin Street, South Acton, Massachusetts, 
US.A. 

Dr. Wm. Jackson Hall, Box 168, Chapel Hill, North Carolina, U.S.A. 

Mr. Robert T. Hardin, Poultry Department, Purdue University, W. 
Lafayette, Indiana, U.S.A. 

Dr. Don W. Hayne, Institute of Fisheries Research, University 
Museums Annex, Ann Arbor, Michigan, U.S.A. 

Mr. Jack F. Hill, Hy-Line Poultry Farms, Johnstown, Iowa, U.S.A. 

Mr. J. Edward Jackson, 212 Roe Avenue, Blacksburg, Virginia, U.S.A. 

Dr. Fred M. Jacobsen, Box 1537, Texas City, Texas, U.S.A. 

Mr. Howard W. Jespersen, c/o Statistical Laboratory, Iowa State 
College, Ames, Iowa, U.S.A. 

Dr. Gordon H. Josie, 177 Wilbrod Street, Ottawa, Ontario, Canada 

Mr. Marcus Kjelsburg, Tulane Medical School, 1430 Tulane Avenue, 
New Orleans, Louisiana, U.S.A. 

Dr. Mark Kormes, 285 Madison Avenue, New York 17, N. Y., U.S.A. 

Dr. Roy R. Kuebler, Jr., 734 Gimghoul Road, Chapel Hill, North 
Carolina, U.S.A. 

Dr. Soloman Kullback, 1259 Van Buren Street, NW, Washington 12, 
D. C., U.S.A. 

Mr. Donald E. Lamphiear, Bendix Systems Division, Ann Arbor, 
Michigan, U.S.A. 

Mr. G. J. Levenbach, 229 Union Avenue, New Providence, New Jersey, 
US.A. 

Mr. Julian I. Lewis, Department of Medicine and Surgery, Veterans 
Administration, Vermont and Eye Street, NW, Washington 25, D. C., 
US.A. 

Dr. Benjamin Lipstein, 85 Croton Avenue, Mt. Kisco, New York, 
US.A. 

Mr. John F. Layman, Babcock Poultry Farm, Inc., Box 286, Ithaca, 
New York, U.S.A. 

Dr. Margaret Wilson Mangel, 110 Gwynn Hall, University of Missouri, 
Columbia, Missouri, U.S.A. 
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Mr. Jack A. Marshall, 16 B O’Daniel Avenue, Newark, Delaware, 
U.S.A. 

Mr. Frank G. Martin, 1695 Parkline Drive, Apt. 33, Pittsburgh 27, 
Pennsylvania, U.S.A. 

Dr. William B. McIntosh, Department of Zoology and Entomology, 
1735 Neil Avenue, Ohio State University, Columbus 10, Ohio, U.S.A. 

Dr. Harlley E. McKean, Statistical and Computing Laboratory, Purdue 
University, Lafayette, Indiana, U.S.A. 

Mr. Francis X. McLaughlin, 911 Lindale Avenue, Drexel Hill, Penn- 
sylvania, U.S.A. 

Mr. Richard C. MeNee, 215 Saratoga, San Antonio, Texas, U.S.A. 

Mr. Forest L. Miller, 520 Wall Street, Apt. 9, Lafayette, Indiana, U.S.A. 

Mr. Irving Miller, 2435 Canterbury Road, Cleveland Hts. 18, Ohio, 
U.S.A. 

Miss Josephine Miller, Dept. of Biochemistry and Nutrition, Virginia 
Polytechnic Institute, Blacksburg, Virginia, U.S.A. 

Dr. M. B. Mueller, South Ridgeway Avenue, Glenolden, Pennsylvania, 
USS.A. 

Dr. Jerome L. Myers, Department of Psychology, University of Massa- 
chusetts, Amherst, Massachusetts, U.S.A. 

Dr. M. Dean Nefzger, 6308 E. Halbert Road, Bethesda 14, Maryland, 
US.A. 

Dr. Nilan Norris, Department of Economics, Hunter College, 695 
Park Avenue, New York 21, N. Y., U.S.A. 

Dr. Edwin G. Olds, 222 Gladstone Road, Pittsburgh 17, Pennsylvania, 
US.A. 

Mr. Benjamin L. Parnell, 6806 Fairfax Road, Bethesda 14, Maryland, 
US.A. 

Dr. Mary Ellen Patno, 5530 5th Avenue, Apt. 1A, Pittsburgh 32, Penn- 
sylvania, U.S.A. 

Mr. Kamini Mohan Patwary, 1820 Clydesdale Place, NW, Washington, 
D. C., U.S.A. 

Dr. G. I. Paul, Department of Genetics, McGill University, Montreal, 
Canada 

Prof. Mahlon F. Peck, Box 5152, Va. Tech. Station, Blacksburg, 
Virginia, U.S.A. 

Mr. Kan-Chen Peng, 18052 Pelkey, Detroit 5, Michigan, U.S.A. 

Dr. Mogens Plum, 208 Dairy Industry Building, University of Nebraska, 
Lincoln 3, Nebraska, U.S.A. 

Miss Edith Reid, 432 W. 45th Street, New York 36, N. Y., U.S.A. 

Mr. C. Reimer, Statistical Laboratory, Science Service, Department of 
Agriculture, Ottawa, Ontario, Canada 
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Mr. Howard R. Roberts, 5302 Pooks Hill Road, Bethesda 14, Maryland, 
US.A. 

Dr. Albert C. Rohloff, 476 Roff Avenue, Palisades Park, New Jersey, 
US.A. 

Mr. Richard T. Roth, Crop-Hail Insurance Actuarial Association, 
209 W. Jackson Blvd., Chicago 6, Illinois, U.S.A. 

Dr. Thomas A. Ryan, Department of Psychology, Morrill Hall, Cornell 
University, Ithaca, New York, U.S.A. 

Mr. Garmond G. Schurr, 11541 South Champlain Avenue, Chicago 28, 
Illinois, U.S.A. 

Dr. J. Allen Scott, Department of Preventive Medicine, University of 
Texas Medical Branch, Galveston, Texas, U.S.A. 

Lt. Donald B. Siniff, Box 432, Grafton, Ohio, U.S.A. 

Dr. Alexander Sokoloff, Wm. H. Miner Institute, Chazy, New York, 
U.S.A. 

Mr. Charles R. Sormani, 411 W. Pike Street, Crawfordsville, Indiana, 
U.S.A. 

Mr. John J. Sowinski, 456 Merchandise Mart, Chicago 54, Illinois, 
US.A. 

Dr. John J. Stansbrey, 314 Wisteria Drive, Dayton 19, Ohio, U.S.A. 

Mr. Selig Starr, 812 University Boulevard East, Silver Spring, Maryland, 
US.A. 

Mr. Stanley Stavropoulos, 2050 E. 68th Street, Chicago 49, Illinois, 
U.S.A. 

Dr. Robert G. D. Steel, University of Wisconsin, 1118 West Johnson 
Street, Madison 6, Wisconsin, U.S.A. 

Mr. Ray B. Stiver, Jr., 914 Michigan Avenue, Buffalo 3, New York, 
US.A. 

Dr. Felicitas Svejda, Central Experiment Farm, Ottawa, Canada 

Miss Barbara Taucci, 1954 East 116th Street, Cleveland 6, Ohio, U.S.A. 

Mr. Robert J. Taylor, Box No. 5958, Va. Tech. Station, Blacksburg, 
Virginia, U.S.A. 

Dr. Charles R. M. Tuttle, 1025 Waverly wink Schenectady 8, New 
York, U.S.A. 

Mr. David Valinsky, 278 First Avenue, New York 9, N. Y., U.S.A. 

Mr. Thomas N. Vinson, Catalyst Research Corporation, 6101 Falls 
Road, Baltimore 9, Maryland, U.S.A. 

Mr. K. E. Vroom, Pulp and Paper Research Inst., 3420 University 
Street, Montreal 2, Canada 

Dr. Robert E. Walton, Dairy Section, University of Kentucky, 
Lexington, Kentucky, U.S.A. 

Dr. A. E. R. Westman, Ontario Research Foundation, 43 Queen’s 
Park, Toronto 5, Canada 
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Dr. John David Wheat, Dept. of Animal Husbandry, Kansas State 
College, Manhattan, Kansas, U.S.A. 

Dr. Phillips Whidden, Aluminum Co. of America, 520 Alcoa Bldg., 
Pittsburgh 19, Pennsylvania, U.S.A. 

Dr. D. Ransom Whitney, Dept. of Mathematics, Ohio State University, 
Columbus 10, Ohio, U.S.A. 

Dr. Martin B. Wilk, Bell Telephone Laboratories, Murray Hill, New 
Jersey, U.S.A. 

Dr. John W. Wilkinson, Westinghouse Research Laboratories, Pitts- 
burgh 35, Pennsylvania, U.S.A. 

Dr. Wm. H. Williams, Dept. of Mathematics, McMaster University, 
Hamilton, Ontario, Canada 

Mr. Jerry Wilson, R. D. 5, Ithaca, New York, U.S.A. 

Dr. James M. Wing, Rt. 3, Box 493, Gainesville, Florida, U.S.A. 

Mr. G. Stanley Woodson, Comm. on Prof. and Hosp. Activities, 211 
First National Bldg. Ann Arbor, Michigan, U.S.A. 

Mr. John Arthur Zoellner, General Electric Company, 1 River Road, 
Schenectady, New York, U.S.A. 


Western North American Region 


Miss Martha L. Agan, 1814 Holmby Avenue, Los Angeles 25, Cali- 
fornia, U.S.A. 

Mr. Lawrence A. Appleton, 647 Morse Street, San Jose 26, California, 
US.A. 

Mr. Donald L. Bentley, 216 Chester Street, Menlo Park, California, 
USS.A. 

Mr. Paul M. Blunk, Box 532, Fair Oaks, California, U.S.A. 

Mr. Jack R. Borsting, Mathematics Department, University of Oregon, 
Eugene, Oregon, U.S.A. 

Mrs Virginia Clark, 1361 S. Beverly Glen Blvd. Los Angeles 24, Cali- 
fornia, U.S.A. 

Mr. William F. Dossett, 4535 Santo Tomas Drive, Los Angeles 8, 
California, U.S.A. 

Mr. Bob E. Ellison, P. O. Box 671, Palo Alto, California, U.S.A. 

Mr. Norman R. Garner, 161 N. Brightview Dr., Covina, California, 
US.A. 

Mr. Alfred C. Hexter, 466 North Street, Oakland 9, California, U.S.A. 

Mr. Harold A. Hoffman, Dept. of Gentics, University of California, 
Berkeley 4, California, U.S.A. 

Dr. Melvin L. Hogsett, Heisdorf and Nelson Farms, Box 428, Kirkland, 
Washington, U.S.A. 

Dr. Dubodh K. Jain, Dept. of Genetics, University of California, Davis, 
California, U.S.A. 
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Mr. R. F. Jarrett, 1328 Josephine Street, Berkeley 3, California, U.S.A. 

Dr. Raymond J. Jessen, 841 Greentree Road, Pacific Palisades, Cali- 
fornia, U.S.A. 

Mr. Jimmy M. Kada, Orange County Health Department, P. 0. 
Box 355, Santa Ana, California, U.S.A. 

Mr. Wharton F. Keppler, U. 8. Army Ordnance Test Activity, Yuma 
Test Station, Yuma, Arizona, U.S.A. 

Dr. H. L. Kravitz, 1800 W. Magnolia Blvd., Burbank, California, 
U.S.A. 

Dr. George M. Kuznets, 207 Giannini Hall, University of California, 
Berkeley 4, California, U.S.A. 

Dr. W. F. Lamoreux, Kimber Farms, Inc., P. O. Box 8, Niles, Cali- 
fornia, U.S.A. 

Mr. William R. Lower, Department of Genetics, University of Cali- 
fornia, Berkeley 4, California, U.S.A. 

Dr. Kuo Hwa Lu, Dept. of Applied Statistics, Utah State University, 
Logan, Utah, U.S.A. 

Mr. Leonard A. Marascuilo, 2645 Shasta Road, Berkeley 8, California, 
US.A. 

Dr. Wesley L. Nicholson, 1930 8. Hartford, Kennewick, Washington, 
U.S.A. 

Dr. Donald B. Owen, 1108 California Street, S.E., Albuquerque, New 
Mexico, U.S.A. 

Dr. Chai Bin Park, 1915 University Ave., Berkeley 4, California, U.S.A. 

Mr. Gerald J. Paulik, Rm. 212, College of Fisheries, University of 
Washington, Seattle 15, Washington, U.S.A. 

Mr. Edward B. Perrin, 315-5 Stanford Village, Stanford, California, 
US.A. 

Dr. David D. Rubis, Dept. of Agronomy, University of Arizona, Tucson, 
Arizona, U.S.A. 

Prof. Henry Solomon, Statistics wesieiieieite Stanford University, 
Stanford, Cailfornia, U.S.A. 

Mr. Richard W. Vail, Jr., 508-B Siniieinee Drive, Arcadia, Cali- 
fornia, U.S.A. 

Mr. John P. Vanderbeck, 207 B. Mitscher, China Lake, California, 
U.S.A. 

Mrs. Pearl A. Van Natta, Child Research Council, University of 
Colorado School of Medicine, 4200 East Ninth, Denver, Colorado, 
US.A, 

Dr. Louis H. Wegner, Jr., 1700 Main Street, Santa Monica, California, 
U.S.A. 
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NEWS AND ANNOUNCEMENTS 


Members are invited to transmit to their National or Regional Secre- 
tary (if members at large, to the General Secretary) news of appointments, 
distinctions, or retirements, and announcements of professional interest. 


Miss Gertrude M. Cox of the Institute of Statistics, North Carolina 
State College, Raleigh, North Carolina, was the 1959 recipient of the 
Oliver Max Gardner Award. The Oliver Max Gardner Award is an- 
nually awarded to the member of the faculty of the Consolidated Uni- 
versity of North Carolina who, during the current scholastic year, has 
made the greatest contribution to the welfare of the human race. The 
award was made to Miss Cox in recognition of her outstanding achieve- 
ments in the development of the Institute of Statistics and for its contri- 
bution to improving the quality of research throughout the Consolidated 
University. 

Dr. Marvin A. Schneiderman was one of eleven selected for an award 
under the Rockefeller Public Service Award Program to carry forward 
job-related educational projects during the academic year, 1959-60. 
Dr. Schneiderman is Section Head, Cancer Chemotherapy National 
Service Center, National Cancer Institute, Department of Health, 
Education, and Welfare. He will make a study in England of methods 
for conducting more efficient clinical trials, particularly in cancer, 
through development of new statistical techniques. Dr. Schneiderman 
will conduct this study at the London School of Hygiene and Tropical 
Medicine. 

John W. Tukey, Professor of Mathematics at Princeton University 
and a member of the mathematical research department of Bell Tele- 
phone Laboratories, has been appointed Assistant Director of Research 
in Communications Principles at the Laboratories. Dr. Tukey will 
continue to serve at Princeton as Professor of Mathematics. 


PUBLIC HEALTH TRAINEESHIPS 


Qualified persons who may be interested in the field of public health 
statistics are eligible to receive grants under a system of Public Health 
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Traineeships. -Stipends range from a minimum of $200.00 per month 
for a pre-bachelor’s candidate to a maximum of $400.00 per month for 
a post-doctoral candidate. Additional allowances are provided for de- 
pendents, travel of the trainee, and academic tuition and fees. 

Additional information and application forms may be secured from 
any of the Regional Medical Directors of the Public Health Service or 
from the Chief, Division of General Health Services, Bureau of State 
Services, Public Health Service, Department of Health, Education, and 
Welfare, Washington 25, D. C. 


OBITUARY 


Dr. phil. Wilhelm Ludwig, Professor of Zoology at Heidelberg and 
long-standing member of our Society, died suddenly in Leipzig on 
January 23, 1959. He was co-founder of our Region and Secretary- 
Treasurer since its inception. 

Always selfless in using his whole strength and personality for the 
good of our Region, he was unfortunately not granted the opportunity 
of enriching our Sixth Biometric Colloquium with his wealth of ideas 
and lively sense of humour. As initiator and indefatigable supporter of 
Biometry and Biomathematics in Germany, he will live on among us 
through his publications and stimulating suggestions. 


The Executive of the German Region 
of the Biometric Society 
Dr. Reimut Wette, Secretary 
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TECHNOMETRICS 


A Journal of Statistics for the 
Physical, Chemical, and Engineering Sciences 


Vol. 1, No. 1 February, 1959 


CONTENTS 


Response Surface Designs for Three Factors at Three Levels 
R. DEBaun 1 


The Analysis of Life Test Data................... R.L. Puackerr 9 
Mathematical Probability in the Natural Sciences. .... R. A. FisHER 21 


A Quick Compact Two Sample Test to Duckworth’s Specifications 
J.W. Tuxey 31 


Some Statistical Aspects of the Economics of Analytical Testing 
O.L. Davies 49 


Partial Duplication of Factorial Experiments........... O. Dykstra 63 


Condensed Calculations for Evolutionary Operation Programs 
G. E. P. Box anp J.8. HUNTER 77 


Notices 


Technometrics is published quarterly in February, May, August, and 
November. The annual non-member subscription rate is $8.00. Special 
rates are available to members of the American Statistical Association and 
the American Society for Quality Control. Inquiries should be addressed 
to either Technometrics, American Statistical Association, 404 Beacon Bldg., 
1757 K Street N.W., Washington 6, D. C. or Technometrics, American Society 
for Quality Control, Rm. 6197, Plankinton Bldg., 161 Wisconsin Ave., Mil- 
waukee 3, Wisconsin. 
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INFORMATION FOR CONTRIBUTORS 


MANUSCRIPTS 


Contributions for Biometrics may be addressed to Dr. Ralph A. Bradley, Depart- 
ment of Statistics, Virginia Polytechnic Institute, Blacksburg, Virginia, U.S.A.; 
authors residing in the following Society Regions can expedite consideration of papers 
by submitting them to the appropriate Associate Editor, namely; BRITISH RE- 
GION: Dr. S. C. Pearce, East Malling Research Station, East Malling, Maidstone, 
Kent, England; AUSTRALASIAN REGION: Dr. E. A. Cornish, University of 
Adelaide, Adelaide, Australia; FRENCH REGION: Dr. Georges Teissier, Faculté 
des Sciences de Paris, 1 rue V. Cousin, Paris, France. QUERIES, NOTES, and 
related correspondence should be directed to Dr. D. J. Finney, Department of 
Statistics, University of Aberdeen, Meston Walk, Old Aberdeen, Scotland. 

MANUSCRIPTS must be submitted in triplicate, with typescript doublespaced 
throughout. Marginal notes may obviate typographical difficulties presented by 
complicated formulae or tables—authors should not attempt editorial instructions 
or markings for the printer. TABLES should be identified by arabic number and 
by a short descriptive title. ILLUSTRATIONS should also be identified by arabic 
number and by a brief caption. (Captions should not be included in illustrations, 
but should be typewritten collectively on an accompanying sheet.) Originals 
should be approximately 8.5 x 11 in. (21.5 x 28 cm.). The original of each chart, 
diagram, or graph should be executed in black on white drawing paper or board, on 
blue tracing linen, or on coordinate paper ruled in blue only; coordinate lines to be 
reproduced should be ruled in black. For printing, illustrations may be reduced to 
¥ or ¥ original dimensions. Lines should therefore be of sufficient thickness, and 
decimal points, periods, and stippled dots should be solid black circles large enough 
to reproduce well. Lettering and numerals should be at least 1 mm. high when 
reproduced in a cut 3 in. (7.5 cm.) wide. Photographs should be prints on glossy 
paper with strong contrasts, and if grouped in a plate should be mounted contig- 
uously. All tables and illustrations should be mentioned explicitly in the text. 
REFERENCES (BIBLIOGRAPHIC) should be collectively listed alphabetically 
by author; textual citation by author and year is preferred. 


ABSTRACTS 


Abstracts of papers presented at meetings of the Biometric Society or of its 
regions are printed in Biometrics following such meetings. They should be submitted 
to the person designated to receive them for a particular meeting in exactly the form 
published in Biometrics (except for an Abstract Number), doublespaced on bond 
paper, and in duplicate. Use of formulae requiring display printing is to be avoided. 


Notices, ANNOUNCEMENTS, AND Biometric Society REPORTS 


International and regional reports and notices should be submitted by the 
appropriate officers of the Society and its Regions in duplicate doublespaced on 
separate sheets exactly as they are to be printed in Biometrics. Other material to 
be printed in News and Announcements should also be submitted doublespaced 
and in duplicate. 


Sustrarninc MEMBERS OF THE Biometric Society 


Abbott Laboratories 

American Cancer Society, Inc. 

Heisdorf and Nelson Farms, Inc. 

Merck, Sharp and Dohme Research Laboratories 
Schering Corporation 

Smith, Kline and French Laboratories 

E. R. Squibb and Sons 

Wallace Laboratories, Division of Carter Products 
Wyeth Institute of Applied Biochemistry 
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BACK ISSUES 


Back issues of Biometrics are available at the following postage-paid 
prices in U.S.A. currency: ; 

Price per Price per 
Year Volume Number Single Number Volume(unbound) 


1945 
1946 
1947 
1948 
1949 
1950 
1951 


1 to 6 $ 
1 to 6 
lto4 
1lto4 
1to4 
1to4 
1to4 
1952 1to4 
1953 1lto4 
1954 1to4 
1955 1to4 
1956 1 to 4 
1957 1to4 
1958 1 to4 
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1 
1. 
1. 
1. 
2. 
2 
2 
2 
2 
2 
2 
2. 


Reprints of individual articles are not available except to authors at the 
time of printing. Three special issues are among the numbers listed 
above. They are: 


1947 Volume 3 Numberl The Analysis of Variance 
1951 Volume 7 Number 1 Components of Variance 
1957 Volume 13 Number 3 The Analysis of Covariance 


Also available are: 
Fishery Reprint Series (Selected reprints from Vol. 5) $1.00 
Subject Index (Volumes 1-10) 1.00 
Proceedings, International Biometric Symposium, 
Campinas, Brazil, 1955. 1.00 


Inquiries, non-member subscriptions, and orders for back issues and 
other material listed above should be addressed to: Biomerrics, DEPART- 
MENT OF SraTistTics, VIRGINIA PoLyTEcHNIc InsTITUTE, BLACKSBURG, 
Virernis, U.S.A. 
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